Mechanizing Alice: Automating the Subject Matter...

1 Revised: August 25, 2017

Mechanizing Alice: Automating the Subject Matter Eligibility Test

of Alice v. CLS Bank

Ben Dugan**

1. INTRODUCTION AND OVERVIEW

In Alice v. CLS Bank, the Supreme Court established a new test for determining whether a patent claim is directed to patent-eligible subject matter.1 The impact of the Court’s action is profound: the modified standard means that many formerly valid patents are now invalid, and that many pending patent applications that would have been granted under the old standard will now not be granted.

This article describes a project to mechanize the subject matter eligibility test of Alice v. CLS Bank. The Alice test asks a human to determine whether or not a patent claim is directed to patent-eligible subject matter. The core research question addressed by this article is whether it is possible to automate the Alice test. Is it possible to build a machine that takes a patent claim as input and outputs an indication that the claim passes or fails the Alice test? We show that it is possible to implement just such a machine, by casting the Alice test as a classification problem that is amenable to machine learning.

This article describes the design, development, and applications of a machine classifier that approximates the Alice test. Our machine classifier is a computer program that takes the text of a patent claim as input, and indicates whether or not the claim passes the Alice test. We employ supervised machine learning to construct the classifier.2 Supervised machine learning is a technique for training a computer program to recognize patterns.3 Training comprises presenting the program with positive and negative examples, and automatically adjusting associations between particular features in those examples and the desired output.4

The examples we use to train our machine classifier are obtained from the United States Patent Office. Within a few months of the Alice decision, examiners at the Patent Office began reviewing claims in patent applications for subject matter compliance under the

An early discussion draft of this article appeared as Estimating the Impact of Alice v. CLS Bank Based on a Statistical Analysis of Patent Office Subject Matter Rejections (February 23, 2016). Available at SSRN: https://ssrn.com/abstract=2730803. This article significantly refines the statistical analysis of subject matter rejections at the Patent Office. This article also clarifies the performance results of our machine classifier, and better accounts for classifier performance when estimating the number of patents invalidated under Alice v. CLS Bank. ** Member, Lowe Graham Jones, PLLC. Affiliate Instructor of Law, University of Washington School of Law. Opinions expressed herein are those of the author only. Copyright 2017 Ben Dugan. I would like to thank Bob Dugan and Jane Winn for their feedback, advice, and support, and Sarah Dugan for her love and encouragement. 1 Alice Corp. v. CLS Bank, Int’l, 134 S. Ct. 2347 (2014). 2 STUART RUSSELL & PETER NORVIG, ARTIFICIAL INTELLIGENCE: A MODERN APPROACH 693-95 (3d ed. 2010). 3 Id. 4 Id.


new framework.5 Each decision of an examiner is publicly reported in the form of a written office action.6 We programmatically obtained and reviewed many thousands of these office actions to build a data set that associates patent claims with corresponding eligibility decisions. We then used this dataset to train, test, and validate our machine classifier.

A. Table of Contents

1. Introduction and Overview .................................................................................................. 1

A. Table of Contents ............................................................................................................ 2

B. Organization of the Article ............................................................................................... 3

2. Brief Review of the Alice Framework ................................................................................... 5

3. Rendering Legal Services in the Shadow of Alice ............................................................... 7

A. Intuition-Based Legal Services ........................................................................................ 8

B. Data-Driven Patent Legal Services .................................................................................. 9

C. Predicting Subject Matter Rejections Yields Economic Efficiencies ................................11

4. Data Collection Methodology .............................................................................................13

5. Data Analysis Results ........................................................................................................18

6. Predicting Alice Rejections with Machine Classification .....................................................24

A. Word Clouds ..................................................................................................................25

B. Classifier Training ...........................................................................................................28

C. Performance of a Baseline Classifier ..............................................................................29

D. Performance of an Improved Classifier ...........................................................................31

E. Extensions, Improvements, and Future Work .................................................................33

7. A Patent Claim Evaluation System .....................................................................................34

A. System Description ........................................................................................................35

B. Claim Evaluation System Use Cases .............................................................................37

C. Questions Arising From the Application of Machine Intelligence to the Law ....................38

8. Estimating the Impact of Alice on Issued Patents ...............................................................40

A. The Classifier .................................................................................................................40

B. Classifier Validation ........................................................................................................41

5 See, e.g., USPTO, Preliminary Examination Instructions in View of the Supreme Court Decision in Alice v. CLS Bank (June 25, 2014), http://www.uspto.gov/sites/default/files/patents/announce/alice_pec_25jun2014.pdf. See generally, USPTO, Subject Matter Eligibility, https://www.uspto.gov/patent/laws-and-regulations/examination-policy/subject-matter-eligibility [hereinafter Preliminary Examination Instructions]. 6 35 U.S.C. § 132; 37 C.F.R. 1.104; MPEP 706.


C. Evaluation of Issued Patent Claims ................................................................................43

9. Conclusion .........................................................................................................................46

B. Organization of the Article

This article is organized in the following manner. In Section 2, we provide an overview of the Alice framework for determining the subject matter eligibility of a patent claim. The Alice test first asks whether a given patent claim is directed to a non-patentable law of nature, natural phenomenon, or abstract idea.7 If so, the claim is not patent eligible unless the claim recites additional elements that amount to significantly more than the recited non-patentable concept.8

In Section 3, we motivate a computer-assisted approach for rendering legal advice in the context of Alice. Alice creates a new patentability question that must be answered before and during the preparation, prosecution, and enforcement of a patent. Section 3 provides inspiration for a data-driven, computer-assisted, predictive approach for efficiently answering the Alice patentability question. Such a predictive approach can be usefully performed at various stages of the lifecycle of a patent, including during initial invention analysis, application preparation and claim development, and litigation risk analysis. Computer-assisted prediction of Alice rejections stands in contrast to traditional, intuition-driven methods of legal work, and can yield considerable economic efficiencies, by eliminating the legal fees associated with the preparation and prosecution of applications for unpatentable inventions, or by eliminating baseless litigation of invalid patent claims. In addition, a predictive approach can be used to assist a patent practitioner in crafting patent claims that are less likely to be subjected to Alice rejections, thereby reducing the number of applicant-examiner interactions and corresponding legal fees during examination.

In Section 4, we describe our data collection methodology. Section 4 lays out our process for generating a dataset for training our machine classifier. In brief, we automatically download thousands of patent application file histories, each of which is a record of the interaction between a patent examiner and an applicant. From these file histories, we extract office actions, each of which is a written record of an examiner’s analysis and decision of a particular application. We then process the extracted office actions, to determine whether the examiner has accepted or rejected the claims of the application under Alice. Finally, we construct our dataset with the obtained information. Our dataset is a table that associates, in each row, a patent claim with an indication of whether the claim passes or fails the Alice test, as decided by a patent examiner.

In Section 5, we present results from an analysis of our dataset. Our analysis identifies trends and subject matter areas that are disproportionately subject to rejections under Alice. Our dataset shows that the subject matter areas that contain many applications

7 Alice, 134 S. Ct. at 2354. 8 Id. at 2354-56.


with Alice rejections include data processing, business methods, games, educational methods, and speech processing. This result is consistent with the focus of the Alice test on detecting claims that are directed to abstract ideas, including concepts such as economic practices, methods of organizing human activity, and mathematical relationships.9

In Section 6, we build a machine that is capable of predicting whether a claim is likely to pass the Alice test. In this section, we initially perform an analysis that identifies particular words that are associated with eligibility or ineligibility under Alice. The presence of such associations indicates that there exist patterns that can be learned by way of machine learning. Next, we describe the training, testing, and performance of a baseline classifier. Our classifiers are trained in a supervised manner using as examples the thousands of subject matter patentability decisions made by examiners at the Patent Office. We then describe an improved classifier that uses an ensemble of multiple distinct classifiers to improve upon the performance of our baseline classifier. We conclude this section with a brief outline of possible extensions, improvements, and future work.

In Section 7, we describe a claim evaluation system. The system is a Web-based application that takes a patent claim as input from a user, and provides the text of the claim to a back-end classifier trained as described above. The system provides the decision of the classifier as output to the user. It is envisioned that a system such as this can be used by a patent practitioner to provide improved Alice-related legal services at various stages of the lifecycle of a patent, as discussed in Section 3.

In Section 8, we utilize our machine classifier to quantitatively estimate the impact of Alice on the universe of issued patents. While other studies have tracked the actual impact of Alice in cases before the Federal Courts, our effort is the first to quantitatively estimate the impact of Alice on the entire body of issued patents.10 To obtain our estimate, we first determine whether our classifier can be used as a proxy for the decision-making of the Federal Courts. Since our classifier is trained based on decisions made by examiners at the Patent Office, it is natural to ask whether the classifier reasonably approximates the way that the courts apply the Alice test. To answer this question, we evaluate the performance of our classifier on patent claims that have been analyzed by the Court of Appeals for the Federal Circuit. The results of this evaluation show that the outputs produced by our classifier are largely in agreement with the decisions of the CAFC.

Finally, we turn our classifier to the task of processing claims from a random sample of 40,000 issued patents dating back to 1996. Extrapolating the results obtained from our sample, we estimate that as many as 100,000 issued patents have been invalidated due to the reduced scope of patent-eligible subject matter under Alice v. CLS Bank. This large-scale invalidation of patent rights represents a significant realignment of intellectual property rights at the stroke of a judge’s pen.

9 Id. at 2354-56. 10 Jasper Tran, Two Years After Alice v. CLS Bank, 98 JOURNAL OF THE PATENT AND TRADEMARK OFFICE SOCIETY 354, 358 (2016).


2. BRIEF REVIEW OF THE ALICE FRAMEWORK

The following procedure outlines the current test for evaluating a patent claim for subject matter eligibility under 35 U.S.C. § 101. We will refer to this test as the “Alice test,” although it was earlier articulated by the Supreme Court in Mayo Collaborative Services v. Prometheus Laboratories, Inc.11

Step 1: Is the claim to a process, machine, manufacture, or composition of matter? If YES, proceed to Step 2A; if NO, the claim is not eligible subject matter under 35 U.S.C. § 101.

Step 2A: Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea? If YES, proceed to step 2B; if NO, the claim qualifies as eligible subject matter under 35 U.S.C. § 101.

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? If YES, the claim is eligible; if NO, the claim is ineligible.12

The test has two main parts. The first part of the test, in Step 1, asks whether the claim is to a process, manufacture, machine, or composition of matter. This is simply applying the plain text of Section 101 of the patent statute to ask whether a patentable “thing” is being claimed.13 As a general matter, this part of the test is easy to satisfy. If the claim recites something that is recognizable as an apparatus/machine, process, manufacture, or composition of matter, Step 1 of the test should be satisfied. If Step 1 of the test is not satisfied, the claim is not eligible, end of analysis.14

The second part of the test attempts to identify claims that are directed to judicial exceptions to the statutory categories.15 The second part of the test has two subparts. Step 2A is designed to ferret out claims that, on their surface, claim something that is patent eligible (e.g., a computer), but contain within them a judicial exception. Step 2A

11 Mayo Collaborative Services v. Prometheus Labs., Inc., 132 S. Ct. 1289 (2012) (addressing a method for administering a drug, and holding that a newly discovered law of nature is unpatentable and that the application of that law is also normally unpatentable if the application merely relies on elements already known in the art); Alice at 2355-60 (applying the Mayo analysis to claims to a computer system and method for electronic escrow; holding the claims invalid because they were directed to an abstract idea, and did not include sufficiently more to transform the abstract idea into a patent-eligible invention). 12 USPTO, 2014 Interim Guidance on Patent Subject Matter Eligibility, 79 FR 74618, 74621 (December 16, 2014) [hereinafter [2014 Guidance] 13 35 U.S.C. § 101 (“Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor.”) 14 E.g., In re Ferguson, 558 F.3d 1359, 1364-66 (Fed. Cir. 2009) (contractual agreements and companies are not patentable subject matter); In re Nuijten, 500 F.3d 1346, 1357 (Fed. Cir. 2007) (transitory signals are not patentable subject matter). 15 Alice, 134 S. Ct. at 2354.


asks whether the claim is directed to one of the judicial exceptions. If not, then the claim qualifies as patent eligible. If so, Step 2B must be evaluated.

The judicial exceptions in Step 2A include laws of nature, abstract ideas, and natural phenomena.16 The category of abstract ideas can be broken down into four subcategories: fundamental economic practices, ideas in and of themselves, certain methods of organizing human activity, and mathematical relationships and formulas.17 Fundamental economic practices include for example creating contractual relationships, hedging, or mitigating settlement risk.18 Ideas in and of themselves include for example collecting and comparing known information, diagnosing a condition by performing a test and thinking about the results, and organizing information through mathematical correlation.19 Methods of organizing human activity include for example creating contractual relationships, hedging, mitigating settlement risk, or managing a game of bingo.20 Mathematical relationships and formulas include for example an algorithm for converting number formats, a formula for computing alarm limits, or the Arrhenius equation.21

In Step 2B, the test asks whether the claims recites additional elements that amount to “significantly more” than the judicial exception. In the computing context, this part of the test is trying to catch claims that are merely applying an abstract idea within a computing system, without adding significant additional elements or limitations.22 Limitations that may be enough to qualify as ‘‘significantly more’’ when recited in a claim with a judicial exception include, for example: improvements to another technology or technical field; improvements to the functioning of the computer itself; effecting a transformation or reduction of a particular article to a different state or thing; or adding unconventional steps that confine the claim to a particular useful application.23

Limitations that have been found not to be enough to qualify as ‘‘significantly more’’ when recited in a claim with a judicial exception include, for example: adding the words ‘‘apply it’’ with the judicial exception; mere instructions to implement an abstract idea on a computer; simply appending well-understood, routine and conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception; or adding insignificant extra-solution activity to the judicial exception.24

16 Id. at 2354. 17 Id. at 2355-56. 18 E.g., Bilski v. Kappos, 561 U.S. 593 (2010) (mitigating settlement risk). 19 E.g., Digitech Image Tech., LLC v. Electronics for Imaging, Inc., 758 F.3d 1344 (Fed. Cir. 2014) (organizing information through mathematical correlations). 20 E.g., buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 112 (Fed. Cir. 2014) (contractual relationships). 21 E.g., Gottshalk v. Benson, 409 U.S. 63 (1972) (algorithm for converting number formats); Diamond v. Diehr, 450 U.S. 175 (1981) (Arrhenius equation). 22 Alice, 134 S. Ct. at 2357-58. 23 2014 Guidance, supra note 12, at 74624, citations omitted. 24 Id.


The Alice test is now being applied by federal agencies and courts at the beginning and end of the patent lifecycle. With respect to the application phase of a patent, shortly after the Alice decision, the Patent Office issued to the examination corps instructions for implementing the Alice test.25 These preliminary instructions were supplemented in December, 2014 by the 2014 Guidance.26 As we will show in Section 5, below, the Patent Office has applied this test widely, with significant numbers of rejections appearing in specific subject matter areas.

With respect to the enforcement phase of the patent lifecycle, the Federal Courts have been actively applying the Alice test to analyze the validity of patent claims in the litigation context.27 As of June, 2016, over 500 patents have been challenged under Alice, with a resulting invalidation rate exceeding 65%.28 The Court of Appeals for the Federal Circuit has itself heard over 50 appeals that have raised the Alice issue.29

Note that when we speak of the “Alice test” in the context of the Patent office we include the entire body of case law that has developed in the wake of the Mayo and Alice decisions.30 The cases following Alice have refined and clarified the Alice two-step analysis with respect to particular fact contexts. The Patent Office has made considerable effort to keep abreast of these decisions and to train the examining corps as to their import.31 To a large degree then, the Patent Office embodies the current state of subject matter eligibility law. And while this law is never static, it is also not changing so quickly as to undermine one of the central premises of this article, which is that the Patent Office can be used as a source of examples of a decision maker (in this case, a sort of “hive mind” comprising many thousands of individual examiners) applying a legal rule to determine whether a patent claim is subject matter eligible. Assuming that the application of the rule is not completely random, as we will show in Section 5, then it should be possible to train a machine to learn the rule (or its approximation) based on our collection of examples.

25 Preliminary Examination Instructions, supra note 5. 26 2014 Guidance, supra note 12. 27 At the time of this writing, a Shepard’s Report indicates that Alice has been cited in over 500 Federal Court decisions. Lexis Search (May 2017). 28 Tran, supra note 10, at 358. 29 USPTO, Chart of Subject Matter Eligibility Court Decisions (updated July 31, 2017), https://www.uspto.gov/sites/default/files/documents/ieg-sme_crt_dec.xlsx. 30 E.g., Ultramercial, Inc. v. Hulu, LLC, 772 F.3d 709 (Fed. Cir. 2014); DDR Holdings, LLC v. Hotels.com, L.P., 773 F.3d 1245 (Fed. Cir. 2104); Enfish LLC v. Microsoft Corp., 822 F.3d 1327 (Fed. Cir. 2016); Bascom Global Internet Services, Inc. v. AT&T Mobility LLC, 827 F.3d 1341 (Fed Cir. 2016); McRO, Inc. v. Bandai Namco Games America Inc., 837 F.3d 1299 (Fed. Cir. 2016); Amdocs (Israel) Ltd. v. Openet Telecom, Inc., 841 F.3d 1288 (Fed. Cir. 2016). 31 The Patent Office has released a number of memoranda discussing decisions of the Court of Appeals for the Federal Circuit, including Enfish, McRO, and Bascom. USPTO, Recent Subject Matter Eligibility Decisions (May 19, 2016), https://www.uspto.gov/sites/default/files/documents/ieg-may-2016_enfish_memo.pdf; USPTO, Recent Subject Matter Eligibility Decisions (November 2, 2016), https://www.uspto.gov/sites/default/files/documents/McRo-Bascom-Memo.pdf


3. RENDERING LEGAL SERVICES IN THE SHADOW OF ALICE

In this section, we motivate a computer-assisted approach for rendering legal advice in the context of Alice. Alice creates a new patentability question that must be answered before and during the preparation, prosecution, and enforcement of a patent. Increased access to data allows us to implement a data-driven, predictive computer system for efficiently answering the Alice patentability question, possibly yielding economic efficiencies.

Alice casts a shadow over virtually every phase of the lifecycle of a patent, including preparation, prosecution, and enforcement. Inventors want to understand as an initial matter whether to even attempt to obtain patent protection for their inventions. The cost to prepare and file a patent application of moderate complexity can easily exceed $10,000, and inventors would like to know whether it is worth it even to begin such an undertaking.32

In addition, there are hundreds of thousands of “in flight” patent applications, all prepared and filed prior to the Alice decision. These applications likely do not include the necessary subject matter or level of detail that may be required to overcome a current or impending Alice rejection. These applications may not contain evidence of how the invention improves the operation of a computing system or other technology. In such cases, patent applicants want to know whether it is even worth continuing the fight, given that they must pay thousands of dollars for every meaningful interaction with a patent examiner.33

In the enforcement phase of the patent lifecycle, litigants want to know the likelihood that an asserted patent will be invalidated under Alice. Both parties to a suit rely on such information when deciding whether to settle or continue towards trial. For plaintiffs, the increased likelihood of fee shifting raises the stakes even further.34 From an economic welfare perspective, providing patentees with accurate information regarding the likelihood of invalidation should result in a reduction in the inefficient allocation of resources, by shortening or reducing the number of lawsuits.

A. Intuition-Based Legal Services

Historically, attorneys have provided the above-described guidance by applying intuition, folk wisdom, heuristics, and their personal and shared historical experience. For

32 American Intellectual Property Law Association, 2015 REPORT OF THE ECONOMIC SURVEY, I-85 (median legal fee to draft a relatively complex electrical/computer patent application is $10,000). 33 Id. at I-86 (median legal fee to prepare a response to an examiner’s rejection for a relatively complex electrical/computer application is $3,000). 34 Octane Fitness, LLC v. Icon Health & Fitness, Inc., 134 S. Ct. 1749 (2014). See e.g., Edekka LLC v. 3Balls.com, Inc., E.D. Texas Case 2:15-cv-00541-JRG, Document No. 133, Order by Judge Gilstrap awarding attorney fees under 35 U.S.C. § 285 in a case dismissed for claims found invalid under Alice.


example, in the context of patent prosecution generally, the field is rife with (often conflicting) guiding principles,35 such as:

• Make every argument you possibly can • To advance prosecution, amending claims is better than arguing • Keep argument to a minimum, for fear of creating prosecution history estoppel or

disclaimer • File appeals early and often • Interviewing the examiner expedites examination • Interviewing the examiner is a waste of time and money • Use prioritized examination – you’ll get a patent in 12 months!36 • You’re playing a lottery: if your case is assigned to a bad examiner, give up hope!

Unfortunately, the above approaches are not necessarily effective or applicable in all contexts. For example, while some approaches may have worked in the past (e.g., during the first years of practice when the attorney received her training), they may no longer be effective, given changes in Patent Office procedures and training, changes in the law, and so on.

Nor do the above approaches necessarily consider client goals. Different clients may desire different outcomes, depending on their market, funding needs, budget, and the like. Example client requirements include short prosecution time (e.g., get a patent as quickly as possible), long prosecution time (e.g., delay prosecution during clinical trials), obtaining broad claims, minimizing the number of office actions (because each office action costs the client money), or the like. It is clear that any one maxim or approach to patent prosecution is not going optimize the outcome for every client in every possible instance. While a truly optimal outcome may not be possible, in view of the randomness and variability in the examination system, it is undoubtedly possible to do better. In the following subsection, we assert that a data-driven approach can yield improved outcomes and economic efficiencies for the client.

B. Data-Driven Patent Legal Services

A data-driven approach promises to address at least some of the shortcomings associated with the traditional approach to providing patent-related legal services. As a simple example, many clients are concerned with the number of office actions required to obtain a patent. This is because each office action may cost the client around $3000 in attorney fees to formulate a response.37 For large clients, with portfolios numbering in the thousands of yearly applications, reducing the average number of office actions (even

35 The following list is based on the author’s personal experience as a patent prosecutor. At one time or another the author has worked with a client, supervisor, or colleague who has insisted on following one or more of the presented guidelines. 36 Prioritized examination is a Patent Office program that promises to provide a final disposition for a patent application within one year. USPTO, Prioritized Examination, 76 FR 59050 (September 23, 2011). 37 American Intellectual Property Law Association, supra note 32, at I-86 (median legal fee to prepare an amendment/argument for a relatively complex electrical/computer application is $3,000).


by a fractional amount on average) can yield significant savings in yearly fees to outside counsel. For small clients and individual inventors, one less office action may be the difference between pushing forward and abandoning a case. Is it possible to use data about the functioning of the patent office to better address the needs of these different types of clients?

In the academic context, prior studies considering patent-related data have focused largely on understanding or measuring patent breadth, quality, and/or value using empirical patent features. One body of literature uses patent citation counts and other features (e.g., claim count, classification identifiers) of an issued patent to attempt to determine patent value.38 Others have studied the relationship between patent scope and firm value.39 Other empirical work has analyzed prosecution-related data in order to determine patent quality.40

For this project, we are more interested in predicting how decision makers (e.g., judges or patent examiners) will evaluate patent claims. We make such predictions based on the prior behaviors and actions of those decision makers. Fortunately, it is now becoming increasingly possible to cheaply obtain and analyze large quantities of data about the behaviors of patent examiners and judges.

In the patent prosecution context, the Patent Office hosts the PAIR (Patent Application Information Retrieval) system, which provides the “file wrapper” for every published application or issued patent.41 The patent file wrapper includes every document, starting with the initial application filing, filed by the applicant or examiner during prosecution of a given patent application.42

A number of commercial entities provide services that track and analyze prosecution-related data.43 These services provide reports that summarize examiner or group-specific behaviors and trends within the Patent Office, including allowance rates, appeal dispositions, timing information, and the like.44 Such information can be used to tailor prosecution techniques to a specific examiner or examining group. For example, if the examiner assigned to a particular application has, based on his work on other cases, shown himself to be stubborn (e.g., as evidenced by a high appeal rate, high number of

38 See e.g., Mark Carpenter et al., Citation Rates to Technologically Important Patents, 3 WORLD PATENT INFORMATION 160 (1981); John Allison et al., Valuable Patents, 92 GEO. L.J. 435 (2004); Nathan Falk and Kenneth Train, Patent Valuation with Forecasts of Forward Citations, JOURNAL OF BUSINESS VALUATION AND ECONOMIC LOSS ANALYSIS (2016). 39 See e.g., Joshua Lerner, The Importance of Patent Scope: An Empirical Analysis, 25 RAND JOURNAL OF ECONOMICS 319 (1994) (patent classification is used as a proxy for scope) 40 See e.g., Ronald Mann and Marian Underweiser, A New Look at Patent Quality: Relating Patent Prosecution to Validity, 9 J. EMPIRICAL LEGAL STUD. 1 (2012). 41 Patent Application Retrieval System, http://portal.uspto.gov/pair/PublicPair. In addition, bulk data downloads are available at: Google USPTO Bulk Downloads, https://www.google.com/googlebooks/uspto-patents.html; Reed Tech USPTO Data Sets: http://patents.reedtech.com/index.php. 42 37 C.F.R. § 1.2; Manual of Patent Examining Procedure § 719. 43 E.g., Juristat, https://www.juristat.com/; LexisNexis PatentAdvisor, http://www.reedtech.com/products-services/intellectual-property-solutions/lexisnexis-patentadvisor. 44 Juristat, Juristat Primer, https://www.juristat.com/primers/.


office actions per allowance, or the like), then the client may elect to appeal the case earlier than usual, given that further interaction with the examiner may be of limited utility.

In the context of Alice, we can learn many things from patent prosecution data. As one example, we can learn which art units or subject matter classes are subject to the most Alice rejections. While this is useful, it is not always known a priori how a new application will be classified by the Patent Office. As another example, we can learn which examiners are particularly prone to issue Alice rejections, and perhaps more interestingly, how likely an applicant is to overcome that rejection based on the examiner’s decisions in other cases. Dissecting the data even further, we may even be able to learn what types of arguments are successful in overcoming Alice rejections.

C. Predicting Subject Matter Rejections Yields Economic Efficiencies

While the above types of information may be valuable to an applicant in the midst of examination, it is not so useful in the pre-application or post-issuance phases of the lifecycle of a typical patent. A client wishing to file an application for an invention will want to know how likely he is to encounter an Alice rejection. As another example, a client with an issued patent will want know how likely it is that her patent will be invalidated by a court.

In view of the above, the goal of this work is to predict whether a particular patent claim will be considered valid or invalid under Alice, based on patent prosecution-related data obtained from the Patent Office. As described in detail below, such a prediction can be made based on relationships between specific claim terms and the presence or absence of corresponding subject matter rejections issued by the Patent Office.

In related work, Aashish Karkhanis and Jenna Parenti have identified correlations between specific terms in a patent claim with patent eligibility.45 Our work differs from and expands upon that of Karkhanis and Parenti in a number of ways. First, we rely on the decisions made by patent examiners rather than judges.46 The number of claims that have been evaluated for eligibility under Alice in the Patent Office is several orders of magnitude larger than the number of claims that have been similarly evaluated by the courts.47 This means that we have significantly more data to utilize for analysis and machine learning efforts. Second, we have developed a computer program that mechanizes a human decision-making process by exploiting relationships between claim terms and validity to classify claims as valid or invalid. Third, we use our mechanism to estimate the impact on the body of patents issued prior to the Alice decision.

Predicting potential Alice-based validity issues provides benefits in every phase of the patent lifecycle. For example, such predictions can be employed to determine whether to even file a patent application for a given invention. If it is possible, a priori, to cheaply

45 Aashish Karkhanis and Jenna Parenti, Toward an Automated First Impression on Patent Claim Validity: Algorithmically Associating Claim Language with Specific Rules of Law, 19 STAN. TECH. L. REV. 196 (2016). 46 Id. at 215. 47 Tran, supra note 10, at 354 (568 patents have been challenged under Alice as of June, 2016).


determine whether a particular invention is directed to patent ineligible subject matter, then a client may be saved tens of thousands of dollars in legal fees. While legal fees spent in pursuit of an invalid patent will surely enrich the patent attorney who receives them, such fees represent economic waste. Wasted legal fees are resources that could be more productively and efficiently employed in some other context.

In patent preparation or prosecution, predicting subject matter eligibility issues can help attorneys better craft or amend claims. For a given patent claim, such a prediction may serve as an “early warning” sign that can help put the client and attorney on notice that a claim as drafted may be rejected by the Patent Office on subject matter grounds. The claim drafter can then iteratively modify the claim to settle on more detailed claim language that may not suffer from the abstractness issues that trigger a typical Alice rejection. Iteratively obtaining feedback from a machine is much cheaper than doing so with a patent examiner. As noted above, each interaction with an examiner results in thousands of dollars in legal fees to the client. Reducing the number of interactions with the examiner yields considerable savings to the client and the examining corps, and thus increases economic efficiency.

And provided that we can use the Patent Office as a proxy for the decision making of the Federal Courts,48 our predictive techniques can be used to identify weaknesses in asserted claims during the enforcement of a patent. For example, during pre-suit investigation, a patentee could predict whether a given patent claim is likely to be held invalid by the court. Providing patentees with such pre-suit information, coupled with the threat of fee shifting under Octane Fitness,49 may result in a sharp decrease in baseless patent litigation.

Note that we are not claiming that our predictive tool will reduce the amount of effort required to prepare patent claims. We instead assert that in those cases where the invention is clearly directed to unpatentable subject matter, no patent claims will be prepared at all, resulting in savings to the client. In cases where the invention is on the borderline of patentability, the patent attorney may in fact spend more time crafting claims that can avoid a subject matter rejection. Although this will result in higher up-front costs to the client, the client will typically save in the long run, as the number of interactions with the patent office will be reduced.

Nor does our system test claims for every possible basis of invalidity. Patent claims may of course be invalid for many reasons, including for a lack of utility, anticipation or obviousness in view of the prior art, indefiniteness, or a lack of written description.50 Instead, our system only determines whether a given claim is directed to patent-eligible

48 See infra Section 8.B, validating the performance of our classifier with respect to claims analyzed by the Court of Appeals for the Federal Circuit. 49 See supra note 34. 50 35 U.S.C. § 101 (subject matter and utility), 102 (anticipation), 103 (obviousness) and 112 (definiteness and written description).


subject matter under 35 U.S.C. § 101. While automatically determining validity under other statutory grounds is an open area of research, it is not addressed here.

In conclusion, we have presented a case for predictive technologies, such as our machine classifier, which can assist patent practitioners in efficiently analyzing claims for compliance with Alice. Such analysis can yield significant economic efficiencies at nearly every stage of the patent lifecycle, including patent application preparation, prosecution, and enforcement. In the following section, we provide an overview of the data collection method that we use to obtain data for training our machine classifier.

4. DATA COLLECTION METHODOLOGY

In this section, we describe our data collection methodology, and more specifically our process for creating a dataset for training our machine classifier. In brief, we obtain thousands of office actions issued by the Patent Office, each of which is a written record of an examiner’s analysis and decision of a particular patent application. We then process the office actions to determine whether the examiner has accepted or rejected the pending claims of the application under Alice. We then create a table that associates, in each row, a patent claim with an indication of whether the claim passes or fails the Alice test.

Our method relies on as raw material those patent applications that have been evaluated by the Patent Office for subject-matter eligibility under Alice. In the Patent Office, each patent application is evaluated by a patent examiner, who determines whether or not to allow the application.51 Under principles of “compact prosecution,” the examiner is expected to analyze the claims for compliance with every statutory requirement for patentability.52 The core statutory requirements include those of patent-eligible subject matter, novelty, and non-obviousness. 53 If the examiner determines not to allow an application, the examiner communicates the rejection to an applicant by way of an “office action.”54 An office action is a writing that describes the legal bases and corresponding factual findings supporting the rejection of one or more claims.55

Our approach inspects office actions issued after the Alice decision in order to find examples of patent eligible and ineligible claims. As will be described in detail below, these examples are employed in a supervised machine learning application that trains a classifier to recognize eligible and ineligible claims. If the office action contains an Alice rejection, then the rejected claim is clearly an example of a patent-ineligible claim. On the other hand, if the office action does not contain an Alice rejection, then the claims of

51 35 U.S.C. § 131; 37 C.F.R. 1.104. 52 35 U.S.C. § 132; Manual of Patent Examining Procedure § 2103 (“Under the principles of compact prosecution, each claim should be reviewed for compliance with every statutory requirement for patentability in the initial review of the application, even if one or more claims are found to be deficient with respect to some statutory requirement.”) 53 35 U.S.C. § 101, 102, and 103, respectively. 54 35 U.S.C. § 132; 37 C.F.R. § 1.104; Manual of Patent Examining Procedure § 706. 55 Id.


the application provide examples of patent-eligible claims, because we assume that the examiner has evaluated the claims with respect to all of the requirements of patentability, including Alice compliance. If no Alice rejection is present in an office action, then the examiner must have determined that the claims were directed to eligible subject matter.

The goal, therefore, is to find office actions issued after the time at which the Patent Office at large began examining cases for compliance with the rule of Alice. Alice was decided on June 19, 2014. The Patent Office issued preliminary instructions for subject-matter eligibility examination on June 25, 2014.56 These instructions were supplemented and formalized in the 2014 Guidance, issued December 16, 2014.57 In view of this regulatory history of the Patent Office, and partly based on personal experience receiving Alice rejections, we selected October, 2014 as the relevant cutoff date.58 Any office action issued after the cutoff date therefore represents an evaluation of a patent application under Alice.

The following outlines the steps of our data collection process. As a background step, we created a patent document corpus. The patent document corpus is based on full text data provided by the Patent Office of every patent issued since 1996 and application published59 between 2001 and the present.60 We store some of the patent and application data in a full text index. 61 The index includes fields for document type (e.g., application or patent), dates (e.g., filing date, publication date, issue date), document identifiers (e.g., application number, publication number, patent number), technical classification, title, abstract, claim text, and the like. At the time of writing, approximately 4.8 million published applications and 4.0 million patents have been indexed. The use of the patent document corpus will be described further below.

Figure 1, below, is a generalize flow diagram that illustrates data collection operations performed to obtain office actions for analysis.

56 Preliminary Examination Instructions, supra note 5. 57 2014 Guidance, supra note 12. 58 The data analysis presented in infra Plot 1 supports our decision to use October, 2014 as the cutoff date. 59 35 U.S.C. § 122; 37 C.F.R. § 1.104. 60 USPTO Bulk Data includes Patent Grant Full Text Data and Patent Application Full Text Data. The data is hosted by the USPTO and third party vendors, including Google USPTO Bulk Downloads, https://www.google.com/googlebooks/uspto-patents.html and Reed Tech USPTO Data Sets, http://patents.reedtech.com/index.php. 61 For text indexing, we use Apache Software Foundation, Apache Solr, http://lucene.apache.org/solr/.


Figure 1: Data Collection Process

Download file

histories (~180K

large ZIP files)

Unpack ZIP files and

extract office actions

(~90K PDF files)

OCR the office action

PDF files

(~90K TXT files)

Determine whether

each office action

contains an Alice

rejection

Select office actions

from relevant time

period (~32K

documents)

Label application

ACCEPT

(~26K applications)

Label application

REJECT

(~3000 applications)

Pull and label

published claim 1 for

each application

Pull and label claim 1

for each application

that issued as a

patent

NO

YES

Manual evaluation ...

Machine learning ...

Initially, we collect the file histories (“file wrappers”) for a randomly selected set of application numbers.62 Each file history is a ZIP archive file that includes multiple documents, including the patent application as filed, office actions, notices, information disclosure statements, applicant responses, claim amendments, and the like. At the time of this writing, over 180,000 file histories have been downloaded.

As discussed above, we are interested in finding office actions issued by the Patent Office on or after October, 2014. The Patent Office uses a naming convention to identify the files within a file history. For example, the file 12972753-2013-04-01-00005-CTFR.pdf is a Final Rejection (as indicated by the document code CTFR) dated April 1, 2013, for application number 12/972,753. When a file history ZIP file is unpacked, the document code can be used to identify relevant documents, which in this case are Non-Final and Final Rejections.63

At the time of this writing, about 90,000 office actions have been obtained. The number of office actions is smaller than the number of total file histories because many of the file histories are associated with applications that have yet to be examined, and therefore do not contain any office actions.

62 See supra note 60. 63 Identified by the document codes CTNF and CTFR, respectively.


Each office action in the file history is a PDF file that includes TIFF images of the pages of the document produced by the examiner. As each page of an office action is represented a TIFF image, each page of the office action must be run through an optical character recognition (OCR) module to convert the PDF file to a text file.64

Once an office action is converted to text, it can be searched for strings that are associated with Alice rejections. An example Alice rejection reads as follows:

Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows: “Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.”

Claim 1, 9 & 17 are rejected under 35 U.S.C. 101 because the claimed invention is not directed to patent eligible subject matter. Based upon consideration of all of the relevant factors with respect to the claim as a whole, claim(s) 1, 9 & 7 are determined to be directed to an abstract idea. ... The claim(s) are directed to the abstract idea of organizing human activities utilizing well known and understood communication devices and components to request and receive multimedia content by a customer.65

Patent examiners tend to rely on form paragraphs provided by the Patent Office when making or introducing a rejection, so there is fortunately a high level of consistency across office actions.66 Text strings such as the following were used to identify actions that contained an Alice rejection: “35 USC 101,” “abstract idea,” “natural phenomenon,” and the like.67

From the full set of obtained office actions, we selected those issued on or after October, 2014, a total of about 32,000 office actions. We then analyzed each office action in this subset to determine whether it contained an Alice rejection. If an office action did contain an Alice rejection, then the corresponding application was tagged as including a patent-ineligible claim (sometimes also termed REJECT); conversely, if an office action did not contain an Alice rejection, then the corresponding application was tagged as including

64 Optical character recognition provided by Tesseract, https://github.com/tesseract-ocr. 65 U.S. Patent Application No. 14/543,715, Office Action dated December 17, 2014. 66 See, e.g., Manual of Patent Examining Procedure § 706, Form Paragraph 7.05.015 (“the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. Claim(s) [1] is/are directed to [2]. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because [3].”) 67 Specifically, we identify Alice rejections by searching for the strings “abstract idea” and “natural phenom*”. While this technique is efficient, it does result in the rare false positive, such as when an examiner writes, “The claims are not directed to an abstract idea.”


eligible claims (or ACCEPT).68 Based on the 32,000 office actions issued during the relevant period, about 26,000 applications have been identified as eligible, and 3000 as ineligible.

The next step of the process is to identify the claim that is subject to the Alice rejection. Typically, the examiner will identify the claims rejected under a particular statutory provision. For example, the examiner may write “Claims 1, 3-5, and 17-20 are rejected under 35 USC 101 …” Ideally, we would parse this sentence to identify the exact claims rejected under Alice. However, we made the simplifying assumption that, at a minimum, the first independent claim (typically claim 1) was being rejected under Alice.69

We make another simplifying assumption to find the actual claim text rejected under Alice. In particular, we pull the text of the first independent claim (“claim 1”) of the published patent application stored in the patent document corpus described above. Note that this claim is typically the claim that is filed with the original patent application, although it is not necessarily the claim that is being examined when the examiner makes the Alice rejection. For example, the applicant may have amended claim 1 at some time after filing and prior to the particular office action that includes the Alice rejection. However it is unlikely that the claim 1 pending at the time of the Alice rejection is markedly different from the originally filed claim 1. If anything, the rejected claim is likely to be more concrete and less abstract due to further amendments that have been made during examination.

We use claim 1 from the published application because it can be efficiently and accurately obtained. Each patent file history contains documents that reflect the amendments made to the claims by the applicant. It is therefore technically possible to OCR those documents to determine the text of the claims pending at the time of an Alice rejection. However, because applicants reflect amendments to the claims by using strikethrough and underlining, these text features greatly reduce the accuracy of our OCR system. In the end, we decided to rely on the exact claim text available from the patent document corpus instead of the degraded OCR output of the actual claim subjected to the Alice rejection. Further work will show whether this assumption had a significant impact on the results presented here.

For applications that were examined during the relevant time period but that were not subject to an Alice rejection (that is, they “passed” the test), we prefer to use claim 1 from the patent (if any) that issued on the corresponding application. Claim 1 from the issued patent is preferred, because it reflects the claim in final form, after it has been evaluated

68 Note that it is possible for an application to be labeled both REJECT and ACCEPT, due to a first office action that includes an Alice rejection and a second office action that does not include an Alice rejection. 69 It should never be the case that a dependent claim will be rejected under Alice if its corresponding independent claim is not rejected under Alice, as dependent claims are strictly narrower than their parent claims. Moreover, based on the author’s personal experience as a patent prosecutor, it is very rare that an examiner will, within one set of claims, allow one independent claim under Alice while rejecting another. Typically, all of the claims rise and fall together under Alice, since the analysis is intentionally designed to ferret out abstractions even when they are claimed in the more mechanical claim formats (e.g., apparatus vs. method). Our simplifying assumption was supported via a manual spot check of over a hundred cases: the examiner reached different conclusions for different independent claims in only a handful of applications.


and passed all of the relevant statutory requirements, including subject matter eligibility under Alice, based on the existence of an office action issued after October, 2014. If there is no issued patent, such as because the applicant and examiner are still working through issues of novelty or non-obviousness, we currently elect not to use claim 1 from the published patent application. For machine learning purposes, this has resulted in slightly improved performance, possibly because the claims evaluated for Alice compliance were actually markedly different than the published claim 1.

Note that the above-described process was iteratively and adaptively performed. From an initial random sample of patent applications, it was possible to identify those classes where Alice rejections were common. Then, we preferentially obtained additional applications from those Alice rejection-rich classes, in order to increase the likelihood of obtaining office actions that contained Alice rejections.

Once we perform the above-described data collection, the identified claims are stored in a table. Each row of the table includes an application number, a patent classification identifier, an Alice-eligibility indicator (e.g., a tag of “accept” or “reject”) denoting whether the claim was accepted or rejected by the examiner, and the text of the claim itself. Patent classification is used by the Patent Office to group patents and applications by subject matter area.70 The patent classification identifier is a Cooperative Patent Classification (CPC) scheme identifier that is obtained from the patent document corpus, as assigned by the Patent Office to each patent application and issued patent.71 Retaining the patent classification allows us to break down analysis results by subject matter area.

In this section we have described our process for creating our dataset. In Section 5, next, we present an analysis of the obtained data. In Section 6, below, we use the obtained data to train a machine classifier to determine whether or not a claim is directed to patent eligible subject matter.

5. DATA ANALYSIS RESULTS

In this section, we present results from a data analysis of office actions issued by the Patent Office. We are here trying to answer the following questions. First, can we “see” the impact of Alice in the actions of the Patent Office? Put another way, do we see an increase in the number of subject matter rejections coming out of the Patent Office? Second, which subject matter areas, if any, are disproportionately subject to Alice rejections?

Our analysis provides an aggregate view of the impact of Alice in the Patent Office. We show that, as our intuition would suggest, Alice has resulted in an increase in subject matter rejections, and that these rejections fall disproportionately into a few specific subject matter areas. Stepping back, our analysis shows that the impact of Alice on the stream of data produced by the Patent Office is not random – instead, it includes a pattern

70 35 U.S.C. § 8; Manual of Patent Examining Procedure §§ 902, 903, and 905. 71 Manual of Patent Examining Procedure § 905; USPTO, Classification Standards and Development, https://www.uspto.gov/patents-application-process/patent-search/classification-standards-and-development.


or signal that can be recognized by machine learning techniques, as shown below in Section 6.

To begin our data analysis, we pulled a set of file histories from a uniform random sample of about 20,000 patent applications filed during or after 2013.72 In the random sample of 20,000 applications, we found a total of 7367 office actions arising from 7160 unique applications which had received at least one office action. Of the 7160 unique applications, we found 460 (6.4%) that included at least one Alice rejection.

Plot 1, below, supports our selection of the relevant time period as being after October, 2014. Subject matter rejections were identified by searching for particular text strings in office actions, as described above. Plot 1 shows the monthly fraction of office actions containing a subject matter rejection. The data in Plot 1 is based on our random sample of about 20,000 patent applications. We then counted how many office actions that are associated with the sample and that issued in a given month contained a subject matter rejection. The error bars provide the 95% confidence interval for the measured proportion.

Plot 1: Subject Matter Rejections by Month

72 As describe above, we preferentially search for applications in classes that are rich in Alice rejections. This process, while useful for finding examples of rejections for purposes of machine learning, skews the data set in favor of particular subject matter classes where Alice rejections are common. This skew complicates any reporting of generalized statistics or distribution of rejections. For this reason, the data analysis presented here is based on a random sample of applications.


In Plot 1, a marked increase in subject matter rejections occurs in the July/August, 2014 timeframe. This increase is consistent with the publication of preliminary examination instructions for subject-matter eligibility by the Patent Office on June 25, 2014.73

Plot 2, below, tells us which subject matter areas are subject to high numbers of Alice rejections. The plot is based on the same random sample of about 20,000 patent applications discussed above. The graph shows the total number, by Cooperative Patent Classification (CPC) class,74 of applications that have at least one Alice rejection. CPC classes having fewer than 3 rejections were eliminated from consideration.

Plot 2: Alice rejections by CPC class

Table 1, below, provides descriptions for many of the CPC classes shown in Plot 2. The classes with the highest number of rejections are G06Q (business methods), G06F (digital data processing), and H04L (digital information transmission).

Plot 3, below, provides another view of the random sample of 20,000 applications. The graph breaks down the total number of cases in each class into those that have at least one Alice rejection and those without. The black and light grey bars respectively represent the number of cases with and without an Alice rejection.

73 Preliminary Examination Instructions, supra note 5. 74 The Patent Office groups applications by subject matter area using the Cooperative Patent Classification scheme. Supra note 71.


Plot 3: Cases With/Without Alice Rejection by CPC class

As can be seen in Plot 3, the total sample size is quite small for many of the classes. In order to improve the statistical significance of our findings, we performed further data collection, focusing on those CPC classes from our random sample that included at least some Alice rejections. This additional data collection resulted in a larger, non-uniform sample of about 38,000 office actions, of which about 3500 included an Alice rejection.

Plot 4, below, tells us the percentage of applications in each class that have been subjected to an Alice rejection. Plot 4 was generated based on our larger, non-uniform data sample described above. The data in this sample was purposely skewed towards those classes where Alice rejections are more common, in order to determine a more statistically accurate rejection rate for those classes.


Plot 4: Percentage of Alice rejections by CPC class

It is notable that in several subject matter areas, over 40% of the applications are subjected to Alice rejections. Table 1, below provides the titles for the CPC classes shown in Plot 4, above.


Table 1: CPC class descriptions

Class Rejection Rate

n Standard Error

Description

A61B 5.5% 1424 0.6 DIAGNOSIS; SURGERY; IDENTIFICATION

A61K 3.9% 2086 0.4 PREPARATIONS FOR MEDICAL, DENTAL, OR TOILET PURPOSES

A63F 43.3% 254 3.1 CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES …; VIDEO GAMES

B25J 10.2% 108 2.9 MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES

B60W 12.6% 207 2.3 CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION …

C07K 7.6% 514 1.2 PEPTIDES

C12N 7.9% 164 2.1 MICRO-ORGANISMS OR ENZYMES; COMPOSITIONS THEREOF

C12Q 22.4% 183 3.1 MEASURING OR TESTING PROCESSES INVOLVING ENZYMES OR MICRO-ORGANISMS …

F01N 12.3% 162 2.6 GAS-FLOW SILENCERS OR EXHAUST APPARATUS

G01C 20.6% 218 2.7 MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; …

G01N 9.1% 942 0.9 INVESTIGATING OR ANALYSING MATERIALS …

G06F 10.2% 6501 0.4 ELECTRICAL DIGITAL DATA PROCESSING

G06K 5.6% 268 1.4 COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS

G06Q 65.9% 1801 1.1 DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; …

G06T 12.7% 675 1.3 IMAGE DATA PROCESSING OR GENERATION

G07F 51.6% 310 2.8 COIN FEED OR LIKE APPARATUS

G08G 21.8% 55 5.6 TRAFFIC CONTROL SYSTEMS

G09B 39.7% 131 4.3

EDUCATIONAL OR DEMONSTRATION APPLIANCES; … MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS

G10L 28.4% 201 3.2

SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; …

H04B 4.2% 240 1.3 TRANSMISSION

H04L 9.9% 3272 0.5 TRANSMISSION OF DIGITAL INFORMATION

H04M 9.7% 632 1.2 TELEPHONIC COMMUNICATION

H04N 4.2% 2277 0.4 PICTORIAL COMMUNICATION E.G. TELEVISION

H04W 3.9% 2401 0.4 WIRELESS COMMUNICATION NETWORKS

As shown in Table 1, the subject matter areas subject to the most (by percentage) Alice rejections include “business methods” – data processing systems for administration, commerce, finance, and the like (class G06Q); games (classes A63F and G07F);


educational devices, globes, maps, and diagrams (class G09B); and speech analysis, synthesis, and recognition (G10L).

In this section we have presented an analysis of our dataset. Our analysis indicates that that the subject matter areas that contain many applications with Alice rejections include data processing, business methods, games, educational methods, and speech processing. In the following section, we use our dataset to train a machine classifier to predict Alice rejections.

6. PREDICTING ALICE REJECTIONS WITH MACHINE CLASSIFICATION

Our core research goal is to predict, based on the text of a patent claim, whether the claim will be rejected under Alice. One way to make this prediction is to cast the exercise as a document classification problem. In document classification, a document is assigned to a particular class or category based on features of the document, such as words, phrases, length, or the like. 75

Document classification can be automated by implementing on a computer the logic used to make classification decisions. One very successful example of automated document classification is found in the “spam” filter provided by most modern email services.76 A typical spam filter classifies each received email into spam (i.e., junk mail) or non-spam (i.e., legitimate email) based on features of the email, such as its words, phrases, header field values, and the like.

While it is possible to manually implement the decision logic used to classify documents, it is more common to use machine learning. At a high level of generality, machine learning is a technique for training a model that associates input features with outputs. 77 Machine learning can be supervised or unsupervised.78 In supervised learning, a “teacher” trains a model by presenting it with examples of input and output pairs.79 The model is automatically adjusted to express the relationship between the observed input-output pairs.80 The model can then be validated by testing it against novel inputs and tallying how often the model makes the correct classification.81 In unsupervised learning, the goal is to identify patterns in a dataset without guidance provided by a teacher.82 In unsupervised learning, a model is generated without the use of input-output pairs as training examples.83

75 RUSSELL & NORVIG, supra note 2, at 865. 76 See e.g., Mehran Sahami et al., A Bayesian Approach to Filtering Junk E-Mail, AAAI'98 WORKSHOP ON LEARNING FOR

TEXT CATEGORIZATION 55 (1998). 77 RUSSELL & NORVIG, supra note 2, at 693-695. 78 Id. at 695. 79 Id. 80 See, e.g., RUSSELL & NORVIG, supra note 2, at 697-703 (discussing an approach to generating a decision tree based on observed examples). 81 Id. at 708-09. 82 Id. at 694. 83 Id.


Our methodology employs supervised learning. Our goal is to teach a machine to classify a patent claim as eligible or ineligible based on its words. The input-output pairs used for training are obtained from our dataset, described above, which associates patent claims with corresponding classifications – eligible or not eligible – made by human examiners at the Patent Office. As described in detail below, these classifications can be used to train and evaluate various common supervised machine learning models.

In the following subsections, we begin with an exploratory analysis that identifies particular words that are associated with eligibility or ineligibility under Alice. The presence of such associations indicates that there exist patterns that can be learned by way of machine learning. Next, we describe the training, testing, and performance of a baseline classifier, in addition to techniques for an improved classifier.

A. Word Clouds

When initially exploring whether it would even be possible to predict subject matter rejections based on the words of a patent claim, we first explored the associations between claim terms and Alice rejections. Plot 5, below, includes two word clouds that can be used to visualize such associations. In Plot 5, the left word cloud depicts words that were highly associated with acceptable subject matter, where the words are sized based on frequency. The right word cloud depicts words highly associated with unacceptable subject matter. Each word cloud was formed by building a frequency table, which mapped each word to a corresponding frequency. A first frequency table was built for eligible claims, and a second frequency table was built for ineligible claims. The highest N words from each table were then displayed as a word cloud, shown below.


Plot 5: Raw frequency word clouds

Eligible: Ineligible:

Note that many terms, such as “method,” “data,” and “device” appear with high frequency in both accepted (left cloud) and rejected claims (right cloud). This not surprising as these are very common words in the patent claim context. However, for our purposes, these terms are not useful for distinguishing which words are more commonly associated with exclusively one class or the other.

Plot 6, below, includes two frequency word clouds without common terms. Put another way, the words shown in the clouds of Plot 6 are sized based on the absolute value of the difference of the frequencies in Plot 5. Thus, a word that is equally common in both data sets (accepted and rejected claims) should not appear in either cloud.


Plot 6: Raw frequency without common terms

Eligible: Ineligible:

The word clouds of Plot 5 do a better job of matching our intuition about what kinds of words might be associated with Alice rejections. In the right (ineligible) word cloud, we see terms such as “method,” “computer,” “information,” “associated,” “transaction,” “payment,” “account,” and “customer.” These are all words that would be used to describe business and financial methods, techniques that are in the crosshairs of Alice. On the left side, in the eligible word cloud, there are more terms that are associated with physical structures, including “portion,” “formed,” “surface,” “connected,” “disposed,” “configured,” “material,” and the like.

Table 2, below, lists the top 20 claim terms that are respectively associated with patent eligibility or ineligibility.

Table 2: Words Strongly Associated With (In)eligibility

Eligible Claims Ineligible Claims

surface control method implemented

portion layer computer providing

connected signal associated transaction

end substrate determining generating

disposed body receiving identifying

formed material information storing

configured light user account

direction arranged system database

extending member data payment

side form processor game


B. Classifier Training

As noted above, predicting whether a patent claim will be subject to an Alice rejection is a classification problem, similar to that of detecting whether an email message is spam. At the end of the above-described data collection process, we are in possession of a data set that includes about 20,000 claims, each of which is labeled as “accept” (subject matter eligible – no Alice rejection issued) or “reject” (subject matter ineligible – Alice rejection issued).84 Roughly 85% of the claims in the data set are patent eligible, while the remaining 15% are ineligible.85

The dataset is then used to train a classifier, which is a mathematical model that maps input features of a document to one or more classes. For example, as discussed above in the context of a spam filter for email, a classifier determines the class of an email (i.e., spam or not spam) based on its features (e.g., words). Similarly, in our application, we generate a classifier that determines the class of a patent claim (i.e., valid or invalid) based on its features (e.g., words).

In machine learning, a classifier can be trained in a supervised manner by showing the classifier many examples of each class.86 The classifier learns that certain features or combinations of features are associated with one class or the other. 87 In our case, we elected to show the classifier the words of a claim, without reference to their order, meaning, or location in the claim. This is sometimes known as a “bag of words” approach, in which a text passage is converted into a frequency table of terms.88 The words of the claim provide the input features for the classifier; the output of the classifier is an indication of whether the claim is patent eligible or not.

Prior to training, we stemmed the words of the claims.89 Stemming converts distinct words having the same roots but different endings and suffixes into the same term. Stemming reduces the number of distinct words being analyzed in a machine learning scenario. For example, words such as “associate,” “associated,” “associating,” and “associates” may all be converted to the term “associ.”90

84 As discussed in detail above, the “accept” claims in the data set were those obtained from applications that had received an office action during the relevant time period (post October, 2014), so that we could be reasonably certain that an examiner had evaluated the claim for Alice compliance. For machine learning purposes, we limited the claims in the ACCEPT class to those from patents that issued during the relevant time period, because we could be confident that those claims were in the form actually examined and approved by the examiner. This reduced the number of total claims from about 29,000 to 22,000. 85 Our full training set included 21,693 claims, of which 2963 (about 13.7%) were rejected as abstract. 86 See, e.g., RUSSELL & NORVIG, supra note 2, at 697-703 (discussing an approach to generating a decision tree based on observed examples). 87 Id. 88 Id. at 866. 89 Martin Porter, An Algorithm for Suffix Stripping, 14 PROGRAM 130-138 (1980). 90 By converting variations of a word onto a single term, stemming has the effect of condensing a sparse feature set. With a sufficiently large number of samples, stemming may not be necessary, and may even degrade classifier


When training a classifier, the dataset is typically split into two subsets, a training set and a testing set. 91 The classifier is not exposed to examples in the testing set until after training is complete, in order to obtain a true gauge of the classifier’s performance. In our case, we set aside 20% of the cases for the test set and used the remaining cases for training.

We then trained our classifiers using the remaining ineligible claims (about 2400) and the same number of randomly selected eligible claims. By adjusting the mix of eligible and ineligible claims, a classifier can be biased towards classifying examples towards the majority class. 92 In our case, the use of an even split was used to make the classifier more likely to recognize claims as ineligible, at the cost of introducing additional false positives – claims classified as ineligible that are in fact eligible. As discussed further below, the mix of eligible and ineligible training examples can be adjusted to tune classifiers to emphasize particular performance metrics.

C. Performance of a Baseline Classifier

There exist many different machine classification techniques.93 Modern machine-learning toolkits provide implementations of multiple classifiers via uniform programming interfaces.94 In this study, we started by training a Logistic Regression classifier on an equal mix of eligible and ineligible example claims.95

Table 3: Logistic Regression Classifier Performance Results

Classifier Class Precision Recall F-score Accuracy MCC

Logistic Regression96

accept 0.960 0.747 0.840

reject 0.332 0.803 0.470

average 0.875 0.755 0.790 0.755 0.401

Table 3 above provides performance data for a baseline Logistic Regression classifier.97 Performance was measured by first training the classifier using claims in the training data set, then exposing the trained classifier to claims in the test data set, and finally tabulating

performance. CHRISTOPHER MANNING ET AL., AN INTRODUCTION TO INFORMATION RETRIEVAL 339 (Online Ed. 2009), https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf. 91 RUSSELL & NORVIG, supra note 2, at 695. 92 Yanmin Sun et al., Classification of Imbalanced Data: A Review, 23 INTERNATIONAL JOURNAL OF PATTERN RECOGNITION

AND ARTIFICIAL INTELLIGENCE 687-719 (2009). 93 See e.g., RUSSELL & NORVIG, supra note 2, at 717-53 (discussing various approaches to supervised learning, including decision trees, logistic regression, and neural networks). 94 Pedregosa et al., Scikit-learn: Machine Learning in Python, 12 JOURNAL OF MACHINE LEARNING RESEARCH 2825-2830 (2011). Scikit-learn Online Documentation, http://scikit-learn.org. 95 RUSSELL & NORVIG, supra note 2, at 725-27; Logistic Regression, http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression. 96 Logistic Regression, http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression. 97 RUSSELL & NORVIG, supra note 2, at 717-727.


the resulting classifications. The resulting classifications can be compared to the actual classifications (made by human examiners at the Patent Office) in order to determine how well the classifier performed.

Many metrics are available to evaluate the performance of a classifier. Precision is the fraction of the instances classified into a given class that are correctly classified.98 Our baseline classifier has a precision in the “accept” class of about 0.96. This means that for every 100 claims classified as patent eligible, about 96 of them are correctly classified. Recall is the fraction of relevant instances that are correctly classified.99 Our baseline classifier has a recall in the “accept” class of about 0.75. This means that if there are 100 claims that are patent eligible, the classifier will find (correctly classify) about 75 of them.

A number of aggregate performance metrics are also available. The F-score is the harmonic mean of precision and recall.100 Accuracy reflects the fraction of test cases correctly classified.101 Note that accuracy is not necessarily a useful metric in the presence of imbalanced data. For example, if 90% of the claims are eligible, a “null” classifier that classifies every claim as eligible will have an accuracy of 90%. Such a classifier would of course not be useful in the real world, but does provide a useful baseline for evaluating a machine learning model.

Matthews Correlation Coefficient (denoted MCC in the table) measures the quality of a binary classification. 102 It is effective even if in the presence of imbalanced data, such as is present in this study.103 MCC is a value between -1 and +1, where +1 occurs when every prediction is correct, 0 occurs when the prediction appears random, and -1 occurs when every prediction is incorrect. Our baseline Logistic Regression classifier has an MCC score of 0.40.

As noted above, by changing the mix of training examples, we can adjust a classifier’s view of the ground truth of the world. For example, if during training a classifier sees a mix of 85% eligible cases and 15% ineligible cases, it will tend to be much more likely to classify a given example as eligible. If instead the classifier is trained on an equal mix of eligible and ineligible cases, we would expect to it to be less biased towards eligible classifications. This relationship is illustrated in Plot 7, below.

98 Id. at 869. 99 Id. 100 Id. 101 Note that accuracy is the same as weighted average recall. 102 Matthews Correlation Coefficient, http://scikit-learn.org/stable/modules/generated/sklearn.metrics.matthews_corrcoef.html 103 Id.


Plot 7: Impact of Training Mix on Classifier Performance Metrics

The data for Plot 7 was obtained by training a Logistic Regression classifier using different mixes of eligible and ineligible example claims. Note that at a 1:1 ratio of eligible to ineligible training examples, the recall rate for the accept (eligible) and reject (ineligible) classes is roughly equal at 0.75. However, at that ratio, the reject precision is quite low, around 0.3, meaning that the classifier produces many false positives. The MCC metric, which considers true and false positives and negatives, begins to level off at around 0.45 at a ratio of about 2:1.

D. Performance of an Improved Classifier

We next attempted to develop a classifier that improved upon the MCC score for our baseline Logistic Regression classifier. We implemented our improved classifier as an


ensemble of multiple different classifiers.104 Each classifier was trained using an adjusted example mix, using Plot 7 as a guide. By inspecting Plot 7, and running a number of trials, we learned that training our classifiers on a ratio of eligible to ineligible examples of 5:2 tended to maximize the MCC score.

The selection of the particular types of classifiers was to a large extent exploratory and arbitrary. The machine learning toolkit utilized in this study provides many different classifiers that each have a uniform interface for training and prediction.105 Thus, given our initial data set, it is almost trivial to experiment with different classification approaches known the art. Training and testing multiple distinct classification schemes allowed us to understand whether some types of classifiers outperformed our baseline Logistic Regression classifier, above.

Ensemble classification aggregates the outputs of multiple classifiers, thereby attempting to overcome misclassifications made by any particular classifier.106 In our ensemble, we employed a voting scheme, in which a final classification was based on the majority outputs for a given test case provided to each of our multiple classifiers. Table 4, below, provides the performance results of our improved classifier.

Table 4: Tuned Machine Classification Performance Results

Classifier Precision Recall F-score Accuracy MCC

Logistic Regression 0.876 0.867 0.871 0.867 0.448

Naïve Bayes 0.872 0.533 0.600 0.533 0.259

Decision Tree 0.848 0.844 0.846 0.844 0.325

Random Forest 0.872 0.873 0.872 0.873 0.432

Support Vector Machine 0.871 0.884 0.875 0.884 0.419

Gradient Descent 0.876 0.876 0.876 0.876 0.449

AdaBoost 0.871 0.858 0.864 0.858 0.426

K-Neighbors 0.856 0.829 0.840 0.829 0.355

Gradient Boosting 0.863 0.865 0.864 0.865 0.392

Ensemble (multiple classifiers)

accept 0.933 0.933 0.933

reject 0.551 0.552 0.552

average 0.884 0.884 0.884 0.884 0.485

The bottom section of Table 4 shows the overall performance of the ensemble classifier, including metrics for the individual classes and its average performance. The upper nine

104 In addition to Logistic Regression, the ensemble included a Naïve Bayesian Classifier, a Decision Tree Classifier, a Random Forest Classifier, a Support Vector Machine Classifier, a Stochastic Gradient Descent Classifier, an AdaBoost Classifier, a K-Neighbors Classifier, and a Gradient Boosting Classifier. Supervised Learning, http://scikit-learn.org/stable/supervised_learning.html. 105 Id. 106 RUSSELL & NORVIG, supra note 2, at 748-52.


rows of Table 4 shows the average performance of each of the individual classifiers that make up our ensemble.

We need be careful to not make too much of the accuracy numbers above. At first blush, accuracy scores approaching 90% seem quite impressive, but we must remember that a “dummy” classifier that always classifies every example as patent eligible would have an accuracy of about 85%, given the mix of our population. However, such a dummy classifier would have an MCC score of 0, because it would never correctly classify an ineligible claim. Therefore, we put more weight in the MCC score, which reached about 0.485 in the above test run. Note that using a voting ensemble did yield performance gains, as no individual classifier attained an MCC score over 0.449.

In the end, the best classifier is the one that does the best job of meeting the particular requirements of its application. Plot 7 shows us how to train classifiers to meet particular requirements. For example, if the classifier is to be used as an “early warning” system to flag patent claims that may have eligibility problems under Alice, then we would like a classifier that has a high recall score for ineligible claims. As can be seen in Plot 7, this will come at a loss of precision, meaning that the classifier will identify many false positives – claims classified as ineligible that are not actually ineligible. Of course, this loss of precision may be acceptable if it is really important to catch as many ineligible claims as possible.

E. Extensions, Improvements, and Future Work

Our machine learning process described above may be improved in many ways. First, other features could be considered. Currently, the only features being considered are term frequencies. Other features that are not currently being considered include claim length in words, the number of syntactic claim elements (e.g., the number of clauses separated by semi-colons), the number of gerunds (e.g., receiving, transmitting), or the like.

Also, the current approach measures only the occurrences of single terms, using a “bag of words” approach.107 In this approach, each claim is reduced to a frequency table that associates each claim term with the number of times it occurs in the claim. Therefore, no information is retained about the location or co-occurrence of particular words. This issue can be addressed at least in part by the use of n-grams. An n-gram is a sequence of terms of length n that appear in the given text. For example, bigrams (2-grams) represent sequential pairs of words in the claim.

It is possible that an n-gram-based representation will provide additional features (e.g., specific 2- or 3-word sequences) that are highly correlated with eligible or ineligible claims. For example, “computer interface” might be differently correlated with eligible or ineligible claims than the word “computer” or “interface” taken individually. This may be

107 RUSSELL & NORVIG, supra note 2, at 866.


because, for example, the word “interface” has many definitions, including (1) the interaction between two entities or systems, (2) a device or software for connecting computer components, (3) fabric used to make a garment more rigid.108 The first of these definitions is probably highly associated with ineligible claims, while the last of these definitions is probably highly associated with eligible claims, with the second definition likely appearing somewhere in the middle. By using 2-grams in this case, it may be possible to distinguish which of these three cases applies.

Other potential improvements include the use of a larger dictionary. The current approach has typically utilized just the most significant terms in the data set, generally set around 1000 terms.109 Keeping the term dictionary small facilitates rapid classifier training time, and thus the ability to experiment with many different classifier parameter settings. In a production setting, however, using a larger dictionary may yield marginal but useful performance improvements.110

In this section, we have described the supervised training and performance of machine classifiers that are capable of predicting whether a given patent claim is valid under Alice. Using Matthews Correlation Coefficient as our preferred metric, we have developed classifiers that obtain scores in excess of 0.40 when evaluated against test data held out from our overall dataset. In the following sections, we describe two applications for our machine classifier. First, in Section 7, we present a Web-based patent claim evaluation system that uses our classifier to predict Alice compliance for a given patent claim. Next, in Section 8, we use classifier to quantitatively estimate the impact of Alice on the universe of issued patents.

7. A PATENT CLAIM EVALUATION SYSTEM

In the present section, we describe a Web-based patent claim evaluation system that employs our automatic predictive classification techniques described above. After presenting our system, we describe a number of uses cases for the system within the context of the patent lifecycle, including application preparation, prosecution, enforcement, valuation, and transactions. We conclude with a brief outline of some of the issues presented by the use of “machine intelligence” in the context of legal work.

Our patent claim evaluation system is an example of a computer-assisted legal service, the application of computer function and intelligence to the efficient rendering of legal services. While the legal field has been slow to adopt the efficiencies obtained from information technologies, it has not been immune to change. The word processing

108 Technically, this fabric is called “interfacing,” although if stemming is employed, then “interface” and “interfacing” will likely be stemmed to the same term. 109 Significance is determined using the TF-IDF (Term Frequency times Inverse Document Frequency) measure, which reduces the weight of words that are very common in a given corpus. JURE LESKOVEC, ANAND RAJARAMAN &

JEFFREY D. ULLMAN, MINING OF MASSIVE DATASETS 8 (2014), http://infolab.stanford.edu/~ullman/mmds/book.pdf. 110 Preliminary results do not show significant improvement with a larger dictionary. This may be due to our sample size in relation to the size of the dicationary – there are likely not enough samples to determine the true relationship between a rare feature (term) and an eligible or ineligible classification.


department in a law firm is gradually giving way to self-service word processing and speech recognition dictation software. While many attorneys still like to “look it up in the book,” legal research is increasingly being performed via computer. Manual document review during litigation is being replaced by electronic discovery, including machine classification to discover responsive documents. The claim evaluation system described below supports the analytic functions traditionally performed by an attorney, by helping the attorney identify problematic claims, much in the same way as a doctor would employ a medical test to identify disease.

A. System Description

We have implemented a proof-of-concept Web-based classification application (“the claim evaluator”) that can be used to evaluate patent claim text for subject matter eligibility. The claim evaluator receives a patent claim input into an HTML form presented in a Web browser. The claim text is stemmed and presented to an ensemble of machine classifiers trained as described above.111 The classifier ensemble provides a result that is presented on the displayed Web page.

Below is a screen shot showing an example output of the claim evaluator, given as input an example claim directed to a programmable computer memory system. This claim is obtained from U.S. Patent No. 5,953,740 and was not used to train our classifier. The evaluator classifies this as eligible, with five out of nine (55%) classifiers of the ensemble in agreement.

This claim was also analyzed on appeal by the CAFC in Visual Memory v. NVIDIA Corp.112 There, a three-judge panel held the claim to be patent eligible under Alice by a 2-1 vote. Interestingly, the 5-4 vote of our ensemble of classifiers indicates that this was also a somewhat close case for our classification system.

111 At the time of writing, the ensemble consists of the nine classifiers listed in Table 4 and footnote 104 above. The output of the evaluator is based on the “votes” of each of the classifiers. 112 Visual Memory LLC v. NVIDIA Corp., Case No. 2016-2254, slip. Op. at 15, 2017 U.S. App. LEXIS 15187, at *19 (Fed. Cir. August 15, 2017).


Screen Shot 1

At the bottom of the screen shown above, stemmed claim terms are highlighted red or green to respectively indicate a positive or negative association with patent eligibility under Alice.113 This feature can help the user redraft an example claim to recharacterize the invention at a lower level of abstraction. In the example above, the stemmed terms “comput,” “data,” “processor,” and “store” are colored red to indicate a correlation with patent ineligibility. Terms such as “oper,” “main,” “memori,” “configur,” “cache,” “connect,” and “determine” are colored green to indicate a correlation with patent eligibility. The remaining terms are colored black to indicate a lack of strong correlation in either direction.

Below is a second screen shot of the claim evaluator. This time the evaluator has been asked to evaluate a claim directed to a method for providing a performance guarantee in a transaction. This claim is obtained from U.S. Patent No. 7,644,019.

113 The decision to highlight a term is based on a coefficient for that term determined by the logistic regression classifier of the ensemble. Logistic Regression, http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression.


Screen Shot 2

In Screen Shot 2, the evaluator classifies this claim as ineligible, with all nine classifiers in the ensemble in agreement. This result is consistent with our intuition that business-method type claims are more likely than memory system claims (as in Visual Memory, above) to be invalid under Alice. This result is also consistent with the decision of the CAFC, which held this claim invalid under Alice in buySAFE v. Google.114

B. Claim Evaluation System Use Cases

The described claim evaluator is first and foremost useful for helping understand whether it is even worth applying for a patent directed to a particular invention. There are many hurdles that must be crossed before obtaining a patent. It is frustrating for clients that a patent attorney cannot give reasonable assurances that he or she will be able to draft a patent application having claims that can overcome all of these hurdles. For example, while a prior art search can uncover at least some of the relevant prior art, it is very difficult to predict how an examiner might combine the teachings of multiple prior art references to generate a rejection for obviousness.

With the described evaluator, however, it is now at least possible to flag inventions that that may be subjected to a higher level of scrutiny under the Alice subject matter test. For example, the client or the attorney can draft an abstract or overview of the invention,

114 buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355 (Fed. Cir. 2014).


which is then used as input to the evaluator. If the evaluator tags the given text as ineligible, the client has reason to be concerned.

The evaluator is also useful when preparing the claims of a patent application, and more generally for determining the preferred terms used to describe the invention in the application text. For example, the attorney may prepare a draft independent claim and provide it to the evaluator. Depending on the output of the evaluator, the attorney may revise the claim to use different terms, to describe the invention at a lower level of generality or abstraction, or to claim the invention from a different aspect. Also, one can imagine extending the function of the evaluator to suggest synonyms or related terms that are less highly correlated with Alice rejections.

The described evaluator can also be used to analyze issued patent claims. For example, a patentee could use the evaluator to determine whether an issued patent claim is likely to survive a challenge under Alice if asserted during litigation. As another example, a defendant or competitor could use the evaluator to assess the likelihood of success of challenging the validity of a claim under Alice in litigation or via a post-grant review.

It is also possible to imagine using the evaluator as part of an automated (or computer-assisted) patent analysis or valuation system. Patent analysis, whether performed as part of rendering an opinion of invalidity or determining a patent valuation, is expensive business.115 Legal fees paid for patent analysis are a source of high transaction costs faced by parties attempting to determine whether or how to manage the risk presented by a patent, such as via acquisition, license, or litigation. The described evaluator can be used to reduce these costs because it can perform an initial analysis in an automated manner. This analysis can be used as a data point in a patent valuation formula, as part of a due diligence checklist, or the like.

C. Questions Arising From the Application of Machine Intelligence to

the Law

The claim evaluation system described above demonstrates that machine learning can be employed to at least support the analytic functions traditionally performed by an attorney.116 The application of machine learning in this context of course raises a host of

115 While non-compliance with 35 U.S.C. § 101 is only one of several bases for invalidity, our claim evaluator could still reduce the cost of a typical invalidity opinion, by flagging for further review and analysis claims that are likely invalid under Alice. American Intellectual Property Law Association, 2015 REPORT OF THE ECONOMIC SURVEY, p. I-95 (median legal fee to prepare an invalidity opinion is $10,000). 116 We are not the first to make this claim. For example, automatic document classification employed in the context of electronic discovery has been shown to be at least as accurate as human review. Roitblat, et al., Document Categorization in Legal Electronic Discovery: Computer Classification vs. Manual Review, 61 JOURNAL OF

THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY 70, 79 (2009), http://apps.americanbar.org/litigation/committees/corporate/docs/2010-cle-materials/09-holding-line-reasonableness/09c-document-categorization.pdf


questions. While answers to many of these questions are beyond the scope of this article, we will briefly address some of them below.

The first area of concern relates to the substitutability of machine intelligence for human analysis. To address this issue, we first need to understand whether the performance of the machine intelligence is even comparable to a trained human analyst. Above, we have compared the performance of our “proof of concept” classifier to the aggregated performance of the human examination corps of the Patent Office. While we find that our classifier is certainly better than guessing, it is still wrong over 10% of the time. Is this an acceptable margin of error? The answer to that question depends on the context. If the classifier is used to give a client an initial “heads up” regarding a patentability issue (possibly without the cost of consulting an attorney), then the answer is probably yes. If the classifier is used to determine whether to file a multi-million dollar lawsuit, then the answer may be no.

The possibility of failure raises the specter of legal malpractice. While it is possible to construct a hypothetical in which an attorney is found liable for malpractice for relying on an incorrect assessment provided by a machine classifier or other “intelligent” agent. Yet, we think the benefits to clients outweigh the risks, so long as the client is made to understand the uncertainties that accompany any prediction.

In some ways, using our machine classifier is not so different from performing a prior art search prior to preparing a patent application. Many clients request searches in order to identify any “knock out” prior art. As an initial matter, no patent attorney will assert that the results of a particular search will guarantee smooth sailing before the Patent Office. There are simply too many unknowns, including the limitations of search engines and the fact that patent applications remain non-public and thus unsearchable for at least 18 months after filing.117 If knock out art is found, the client may elect to redirect her legal resources to a different project. And if instead the client elects to move forward, the patent attorney can use the search results to obtain a better understanding of the prior art, which should result in claims that require fewer interactions with the Patent Office, thereby also saving the client legal fees. Machine classification to identify Alice-related issues early the process can be used to similarly to help a client decide where to best apply their limited resources, and further to help an attorney craft better claims in borderline cases.

The second area of concern and inquiry is more philosophical in nature. What does it mean when our classifier determines that a particular claim is not patent eligible? How can we rely on a system that first reduces an English language sentence, with all of its syntactic structure and semantic import, to a bag of stemmed tokens, and then feeds those tokens to a mathematical model that computes a thumbs up or down, without any understanding of language and without any explicit instruction in the rules of the Alice

117 An application publishes 18 months after its earliest filing date for which benefit is sought. 35 U.S.C. § 112. In addition, when the applicant files a non-publication request, the application will not publish until and unless it issues into a patent. 37 C.F.R. § 1.211.


test? Questions such as these have been debated since the earliest attempts to build intelligent machines.118

We take no position on whether our classifier understands language or can otherwise be considered intelligent. At a minimum, our classifier reasonably approximates the behavior of an aggregated group of human patent examiners that are doing their best to implement the Alice test in the examination context. And while there may be nothing like understanding or rule processing in our classifier, this does not mean that it does not have useful, practical applications.

In this section, we have presented a claim evaluation system that employs our machine classifier. We have also discussed its potential applications in the context of rendering legal services in a post-Alice world. We have concluded by briefly outlining some of the concerns and issues related to the use of machine intelligence in the context of legal work. We conclude that our application of machine intelligence can support the traditional analytic functions provided by attorneys, while at the same time allowing clients to make better informed decisions about the application of their limited economic resources.

8. ESTIMATING THE IMPACT OF ALICE ON ISSUED PATENTS

In this section, we use our approach to estimate the impact of the Alice decision on the millions of in-force patents. For this article, we estimated the number of patents invalidated under Alice by classifying claims from a sample of patents issued prior to the Alice decision. To perform the evaluation, we employed the following approach. First, we trained a machine classifier as discussed above. Second, we determined whether our classifier can serve as an acceptable proxy for the decision making of the Federal Courts. Third, we evaluated the first independent claim from one percent of the issued patents in our patent corpus: about 40,000 patents issued between 1996 and 2016.

A. The Classifier

For this analysis, we used a Logistic Regression classifier. We used a Logistic Regression classifier because, compared to the ensemble classifier discussed above, it is quick to train and can efficiently process the large number of patent claims in our sample. In addition, the performance of the Logistic Regression classifier is not much worse than our ensemble classifier.119 The classifier was trained on a 1:1 ratio of eligible to ineligible examples. As shown in Plot 7, this ratio results in a classifier with roughly equal precision and recall scores for the ineligible class. Note that such a classifier is somewhat aggressive with respect to classifying claims as ineligible. Such a classifier is

118 See e.g., John R. Searle, Minds, Brains, and Programs, in MIND DESIGN 282-306 (John Haugeland ed., 1981); HUBERT DREYFUS, WHAT COMPUTERS STILL CAN’T DO: A CRITIQUE OF ARTIFICIAL REASON (1992). 119 The ensemble discussed with respect to Table 4 has an MCC score of 0.485 while the Logistic Regression classifier has a score of 0.448. While the ensemble is better, its use is not necessary to obtain a rough estimate of the number of invalid patents.


“good” at finding ineligible claims, at the expense of additional false positives (claims classified as ineligible that are actually eligible).120

B. Classifier Validation

We next determined whether our classifier could serve as an acceptable proxy for the decision making of the Federal Courts. Our classifier was trained based upon decisions made by examiners at the Patent Office. It is thus natural to ask whether such a classifier replicates the decision making of judges in the Federal Courts. If it does not, then it is unlikely that our classifier can tell us with any precision how many patents have been invalidated under Alice.

To validate the performance of our classifier, we evaluated claims from post-Alice cases appealed to Court of Appeals for the Federal Circuit. The Patent Office maintains a record of subject matter eligibility court decisions.121 From this record, we obtained a list of patents that had been the subject of appeals heard by the CAFC in the post-Alice timeframe. The list included 77 patents that were each associated with an indicator of whether the patent claim at issue was held eligible or ineligible. We then pulled the relevant independent patent claims from each patent in our list.122 Of these 77 claims, the CAFC held that 63 (82%) were not directed to patent-eligible subject matter.

After training, we next evaluated the Federal Circuit claims using our classifier. Table 5, below, compares the performance of our classifier on test claims drawn from our Patent Office dataset to its performance on the claims of the Federal Circuit dataset. Note that the classifier was not exposed to any of these claims during training. In the case of the Patent Office data set, the test claims were held out and not used during training.

120 From Plot 7, such a classifier has an ineligible recall rate of about 0.80 but an ineligible precision of around 0.35. 121 USPTO, Chart of Subject Matter Eligibility Court Decisions, https://www.uspto.gov/sites/default/files/documents/ieg-sme_crt_dec.xlsx, updated July 31, 2017. 122 For some patents, the Chart of Subject Matter Eligibility Court Decisions identifies the specific claims analyzed by the court. For these patents, we pulled the first independent claim from the list of identified claims. For other patents, we used claim 1 as the representative claim.


Table 5: Classifier Performance for Patent Office and Federal Circuit Data

Test Dataset

Class Precision Recall F-score MCC

Patent Office

Eligible (n=3776) 0.96 0.77 0.85

Ineligible (n=563) 0.34 0.80 0.47

Average 0.88 0.77 0.80 0.412

Federal Circuit

Eligible (n=14) 0.30 0.57 0.39

Ineligible (n=63) 0.88 0.70 0.78

Average 0.77 0.68 0.71 0.218

We can make several observations about the above results. First, for the Federal Circuit data, we do not make much out of the recall and precision scores for the patent eligible class, because the collection does not truly reflect the landscape of litigated patent claims. Specifically, the Federal Circuit data is skewed in the reverse of our training data. Our training data includes about 13% ineligible claims, whereas the Federal Circuit dataset includes about 80% ineligible claims. This should come as no surprise as there is a powerful selection bias at work in Federal Circuit dataset. In particular, the Federal Circuit dataset only includes cases where the issue of subject matter eligibility was raised at trial and ultimately appealed to the CAFC. It does not include examples from the hundreds if not thousands of patent cases where the issue was never even raised during the suit.123 There are thus many likely patent-eligible claims that do not appear in the Federal Circuit dataset.124

We are most interested in the recall rate for the ineligible class. The recall rate for ineligible claims reflects the classifier’s ability to find ineligible claims, and thus invalid patents. The classifier correctly identified about 70% of the ineligible claims in the Federal Circuit dataset, which is not so different from the ineligible recall rate of 80% in our set of Patent Office data. The classifier appears also to do well in terms of ineligible precision in the Federal Circuit data set, but to a large degree, this number just reflects the fact that the data is skewed heavily in favor of ineligible claims.

In the end, the best we can say at this point is that our classifier reasonably replicates the ability of CAFC judges to find ineligible patent claims. While the Federal Circuit dataset is simply too small and skewed to draw any deeper conclusions, this result at least gives us confidence that we can use the classifier to roughly estimate the number of patents made ineligible under Alice, as discussed further below.

123 At least 4000 patent lawsuits have been filed in each year of the period 2012-2016. Jacqueline Bell, Patent Litigation in US District Courts: A 2016 Review, LAW 360 (March 1, 2017), https://www.law360.com/articles/895435/patent-litigation-in-us-district-courts-a-2016-review. 124 Future work is directed to obtaining a higher quality Federal Court data set, that includes claims from the many litigated patents where the issue of eligibility was never raised, or was raised and answered in favor of the patentee.


C. Evaluation of Issued Patent Claims

We next turned our classifier onto a 1% sample of the patents in our patent document corpus. Our corpus contains about 4 million utility patents issued between 1996-2016. The sample therefore contains about 40,000 patents. Plot 8, below, shows the predicted invalidity rate by year, as produced by our classifier.

Plot 8: Predicted Invalidity Rate by Year

Plot 8 has a number of interesting features. As an initial matter, the graph shows a predicted invalidity rate in excess of 10% for most years. This number is too high for reasons that will be discussed below. At this point we are more interested in year-over-year changes than in the exact predicted rate. First, the graph shows a marked drop in the invalidity rate after 2014, which coincides with the implementation of the Alice-review standards within the Patent Office. Second, between 1996 and 2014, there is a clear upward trend in predicted invalidity rate. This trend appears to be due at least in part to the rise of computer-related industry sectors over time. As shown above, in Plot 4 and Section 5, the Alice decision has disproportionately impacted computer-related technologies compared to, for example, the mechanical arts. Over the last two decades, the share of issued patents that are directed to computer-related technologies has increased over time, which explains at least some of the rise in predicted invalidity rate. This effect can be seen in Plot 9, below.


Plot 9: Predicted Invalidity Rate and Classes G06F and G06Q Over Time

Plot 9 shows the predicted invalidity rate (top line) along with the yearly share for classes G06F (digital data processing, middle line)125 and G06Q (business methods, lower line).126 The yearly share is the fraction of issued patents in a given class for a given year. Between 1996 and 2015, class G06F increased its share from about 4% to about 11% of the total yearly number of issued patents. During the same period, class G06Q similarly rose from a share of less than 0.5% to a share of about 2%. Notably, the share of class G06Q, where over 60% of the applications were rejected under Alice,127 declined by about 50% after 2014.128 It is likely that this decline is a result of the heightened scrutiny under Alice faced by applications in this class.

In order to estimate the number of patents invalidated under Alice, we need an accurate estimate of the invalidity rate. Our classifier predicts an invalidity rate for patents issued during the pre-Alice period of 1996-2013 of about 13%. This number seems too high on its face. But this should not be a surprise when we consider the precision and recall rates of our classifier. For the ineligible class, the precision and recall rates are 0.34 and 0.80, respectively. The precision score tells us that for every 100 claims classified as ineligible, only 34 are actually correctly classified. The recall score tells us that the 34 correctly classified claims only reflects about 80% of the total population of ineligible claims. Thus, starting with the predicted invalidity rate of 13%, it seems safe to say that 5% (≈ 13% x 0.34 / 0.80) is a more accurate number.

125 CPC class G06F is entitled ELECTRICAL DIGITAL DATA PROCESSING. 126 CPC class G06Q is entitled DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; … . 127 Plot 4, supra. 128 Specifically, from 2.3% in 2014 to 0.8 % in 2015 and 1.2% in 2016.


Next, we estimate the total number of in-force patents issued during the pre-Alice period prior to 2014. One accounting estimates that there were about 2.5 million patents in force in 2014.129 We reduce this number to 2 million patents to exclude patents issued during 2014. 130

Assuming a 5% invalidity rate and about 2 million patents in force at the time of the Alice decision, we estimate that about 100,000 patents have at least one claim that is likely invalid under Alice. How reliable is this estimate? It is of course possible that the classifier overestimated the number of ineligible claims. As discussed above, we have attempted to account for the classifier’s bias by reducing the original estimate of 13% to 5%. Even if our process still grossly overestimates (e.g., by a factor of two) the number of invalid patents, the total would still number around 50,000 patents.

If anything, our process may be conservative compared to the application of Alice by the Federal Courts. One study indicates that as of June, 2016, over 500 patents have been challenged under Alice, with a resulting invalidation rate exceeding 65%.131 An earlier study of over 200 Federal Court decisions showed an invalidation rate of over 70%.132 Of course, these studies focus only on cases where a defendant has moved to invalidate a patent under Alice, and is thus skewed towards cases selected from suspect subject matter areas, such as business methods, advertising, software, and the like.

The outcome of our analysis is even more profound in specific subject matter areas. For example, over 80% of about 400 pre-2014 claims in CPC class G06Q (data processing systems/methods for administrative, commercial, financial, and managerial purposes) were classified as ineligible by the classifier.133 There are over 50,000 issued patents in this class. If we scale the 80% number to 35% as above to account for the classifier’s bias towards invalidity,134 this still means that the Supreme Court may well have invalidated over 15,000 patents in the class alone. Perhaps this impact was the Court’s intent, although it certainly seems an extreme realignment of property rights at the stroke of a judge’s pen.

129 Dennis Crouch, The Number of U.S. Patents in Force, PATENTLY-O (October 23, 2014) (blog post estimating about 2.5 million U.S. patents in force in 2014), https://patentlyo.com/patent/2014/10/number-patents-force.html. 130 By 2014, the patent office was issuing roughly 300,000 patents per year, which justifies reducing our estimate to 2 million from 2.5 million. USPTO, U.S. Patent Statistics Chart Years 1963-2015, https://www.uspto.gov/web/offices/ac/ido/oeip/taf/us_stat.htm. 131 Tran, supra note 10, at 358. 132 Robert Sachs, #alicestorm: When it rains it pours…, BILSKI BLOG (January 22, 2016) (blog post analyzing 208 Federal Court Alice decisions), http://www.bilskiblog.com/blog/2016/01/alicestorm-when-it-rains-it-pours.html. 133 As a sanity check, the classifier processed a sampling of 3731 pre-2014 claims from CPC class F02B (internal combustion piston engines) and predicted only 184 (about 5%) to be patent ineligible. This result again matches our intuition that claims that are mechanical in nature ought to be more likely to be patent eligible. 134 We are conservatively scaling the predicted rate by 0.4 based on ineligible precision and recall rates of 0.34 and 0.80, respectively. Note that the justification for scaling in this context is not as strong as for the average case (general population), because class G06Q undoubtedly contains a higher than average ratio of ineligible to eligible claims.


9. CONCLUSION

We have shown that it is possible to predict, with a reasonably high degree of confidence, whether a patent claim is patent eligible under the Alice test. This prediction is based on thousands of decisions made by human patent examiners charged with implementing the Alice test during the patent examination process. The approach developed in this article has many practical applications, including providing support for patent practitioners and clients during the patent preparation, prosecution, assertion, and valuation. Using machine intelligence to identify Alice-related validity issues may yield economic efficiencies, by diverting legal fees away from non-patentable inventions, by improving claims and thereby streamlining the interactions between applicants and examiners, and by reducing baseless litigation of invalid patent claims. Our use of machine intelligence can assist and improve the analytic functions provided by attorneys, while at the same time providing clients with better information for deciding how to allocate and apply legal resources.

We have also used our approach to quantitatively estimate, for the first time, how many issued patents have been invalidated under Alice, thereby demonstrating the profound and far-reaching impact of the Supreme Court’s recent subject matter eligibility jurisprudence on the body of granted patents. Specifically, by invalidating at least tens of thousands of issued patents, the Court’s actions represent a judicial remaking of patent law that has resulted in a considerable realignment of existing intellectual property rights.

Mechanizing Alice: Automating the Subject Matter...

Documents

Transcript of Mechanizing Alice: Automating the Subject Matter...