MECHANIZING ALICE AUTOMATING THE SUBJECT...

33

MECHANIZING ALICE:

AUTOMATING THE SUBJECT MATTER

ELIGIBILITY TEST OF ALICE V. CLS BANK1

Ben Dugan†

Abstract

This Article describes a project to mechanize the subject matter eligibility

test of Alice v. CLS Bank. The Alice test asks a human to determine whether or

not a patent claim is directed to patent-eligible subject matter. The core research question addressed by this Article is whether it is possible to automate

the Alice test. Is it possible to build a machine that takes a patent claim as input

and outputs an indication that the claim passes or fails the Alice test? We show that it is possible to implement just such a machine, by casting the Alice test as

a classification problem that is amenable to machine learning.

This Article describes the design, development, and applications of a machine classifier that classifies patent claims according to the Alice test. We employ supervised learning to train our classifier with examples of eligible and ineligible claims obtained from patent applications examined by the U.S. Patent Office. In an example application, the classifier is used as part of a patent claim evaluation system that provides a user with feedback regarding the subject matter eligibility of an input patent claim. Finally, we use the classifier to

quantitatively estimate the impact of Alice on the universe of issued patents.

TABLE OF CONTENTS

I. Introduction and Overview ................................................................... 34 A. Organization of the Article ............................................................ 35 B. Brief Review of the Alice Framework ........................................... 37

II. Rendering Legal Services in the Shadow of Alice ................................ 40 A. Intuition-Based Legal Services ...................................................... 42

1. An early discussion draft of this Article appeared as Estimating the Impact of Alice v. CLS Bank Based

on a Statistical Analysis of Patent Office Subject Matter Rejections (February 23, 2016). Available at SSRN:

https://ssrn.com/abstract=2730803. This Article significantly refines the statistical analysis of subject matter

rejections at the Patent Office. This Article also clarifies the performance results of our machine classifier, and

better accounts for classifier performance when estimating the number of patents invalidated under Alice v. CLS

Bank.

† Member, Lowe Graham Jones, PLLC. Affiliate Instructor of Law, University of Washington School

of Law. Opinions expressed herein are those of the author only. Copyright 2017 Ben Dugan. I would like to

thank Professor Bob Dugan and Professor Jane Winn for their feedback, advice, and support, and Sarah Dugan

for her love and encouragement.

34 JOURNAL OF LAW, TECHNOLOGY & POLICY [Vol. 2018

B. Data-Driven Patent Legal Services ................................................ 43 C. Predicting Subject Matter Rejections Yields Economic

Efficiencies .................................................................................... 44 III. Data Collection Methodology ............................................................... 47 IV. Data Analysis Results ........................................................................... 52 V. Predicting Alice Rejections with Machine Classification ..................... 57

A. Word Clouds .................................................................................. 58 B. Classifier Training ......................................................................... 60 C. Performance of a Baseline Classifier ............................................. 61 D. Performance of an Improved Classifier ......................................... 64 E. Extensions, Improvements, and Future Work ............................... 66

VI. A Patent Claim Evaluation System ....................................................... 67 A. System Description ........................................................................ 68 B. Claim Evaluation System Use Cases ............................................. 70 C. Questions Arising From the Application of Machine

Intelligence to the Law .................................................................. 71 VII. Estimating the Impact of Alice on Issued Patents ................................. 73

A. The Classifier................................................................................. 73 B. Classifier Validation ...................................................................... 74 C. Evaluation of Issued Patent Claims ............................................... 76

VIII. Conclusion ............................................................................................ 79

I. INTRODUCTION AND OVERVIEW

In Alice v. CLS Bank, the Supreme Court established a new test for

determining whether a patent claim is directed to patent-eligible subject matter.2

The impact of the Court’s action is profound: the modified standard means that

many formerly valid patents are now invalid, and that many pending patent

applications that would have been granted under the old standard will now not

be granted.3

This Article describes a project to mechanize the subject matter eligibility

test of Alice v. CLS Bank. The Alice test asks a human to determine whether or

not a patent claim is directed to patent-eligible subject matter.4 The core

research question addressed by this Article is whether it is possible to automate

the Alice test. Is it possible to build a machine that takes a patent claim as input

and outputs an indication that the claim passes or fails the Alice test? We show

that it is possible to implement just such a machine, by casting the Alice test as

a classification problem that is amenable to machine learning.

This Article describes the design, development, and applications of a

machine classifier that approximates the Alice test. Our machine classifier is a

2. Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 134 S. Ct. 2347, 2354–55 (2014).

3. See Robert Sachs, #Alicestorm: When it Rains, It Pours…, BILSKIBLOG (Jan. 22, 2016)

http://www.bilskiblog.com/blog/2016/01/alicestorm-when-it-rains-it-pours.html [hereinafter Sachs]

(illustrating that under Alice, the courts have invalidated a patent claim 72% of the time).

4. Alice, 134 S. Ct. at 2359.

No. 1] MECHANIZING ALICE 35

computer program that takes the text of a patent claim as input, and indicates

whether or not the claim passes the Alice test. We employ supervised machine

learning to construct the classifier.5 Supervised machine learning is a technique

for training a computer program to recognize patterns.6 Training comprises

presenting the program with positive and negative examples, and automatically

adjusting associations between particular features in those examples and the

desired output.7

The examples we use to train our machine classifier are obtained from the

United States Patent Office. Within a few months of the Alice decision,

examiners at the Patent Office began reviewing claims in patent applications for

subject matter compliance under the new framework.8 Each decision of an

examiner is publicly reported in the form of a written office action.9 We

programmatically obtained and reviewed many thousands of these office actions

to build a data set that associates patent claims with corresponding eligibility

decisions.10 We then used this dataset to train, test, and validate our machine

classifier.11

A. Organization of the Article

This Article is organized in the following manner. In Section I.B, we

provide an overview of the Alice framework for determining the subject matter

eligibility of a patent claim. The Alice test first asks whether a given patent

claim is directed to a non-patentable law of nature, natural phenomenon, or

abstract idea.12 If so, the claim is not patent eligible unless the claim recites

additional elements that amount to significantly more than the recited non-

patentable concept.13

In Section II, we motivate a computer-assisted approach for rendering legal

advice in the context of Alice. Alice creates a new patentability question that

must be answered before and during the preparation, prosecution, and

enforcement of a patent.14 Section II provides inspiration for a data-driven,

5. STUART RUSSELL & PETER NORVIG, ARTIFICIAL INTELLIGENCE: A MODERN APPROACH 693–95 (3d

ed. 2010) [hereinafter RUSSELL & NORVIG].

6. Id.

7. Id.

8. See, e.g., Memorandum from Deputy Commissioner Andrew H. Hirshfeld on Preliminary Examination

Instructions in View of the Supreme Court Decision in Alice Corporation Pty. Ltd. v. CLS Bank International, et al.

to Patent Examining Corps (June 25, 2014), http://www.uspto.gov/sites/default/files/patents/announce/alice_

pec_25jun2014.pdf [hereinafter Preliminary Examination Instructions] (providing a two-part analysis for

abstract ideas); see generally Subject Matter Eligibility, U.S. PAT. & TRADEMARK OFF. (Dec. 14, 2016, 11:38

PM) https://www.uspto.gov/patent/laws-and-regulations/examination-policy/subject-matter-eligibility

(discussing subject matter eligibility).

9. 37 C.F.R. § 1.104 (2015). See 35 U.S.C. § 132 (2012); MANUAL OF PATENT EXAMINING PROCEDURE

§ 706 (9th ed. rev. 7, Nov. 2015) (stating the procedure for rejection and reexamination).

10. Subject Matter Eligibility Court Decisions, U.S. PAT. & TRADEMARK OFF. (Apr. 22, 2016),

https://www.uspto.gov/sites/default/files/documents/ieg-may-2016-sme_crt_dec_0.pdf.

11. E.g., Ben Dugan, Ask Alice!, https://www.lawtomata.com/predict (last visited Mar. 12, 2018)

[hereinafter Dugan] (this site provides access to an example machine classifier constructed by the Author using

some of the techniques described in this paper).

12. Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 134 S. Ct. 2347, 2354 (2014).

13. Id. at 2354–56.

14. See id. at 2355 (discussing the framework for the patentability question).


computer-assisted, predictive approach for efficiently answering the Alice

patentability question. Such a predictive approach can be usefully performed at

various stages of the lifecycle of a patent, including during initial invention

analysis, application preparation and claim development, and litigation risk

analysis. Computer-assisted prediction of Alice rejections stands in contrast to

traditional, intuition-driven methods of legal work, and can yield considerable

economic efficiencies by eliminating the legal fees associated with the

preparation and prosecution of applications for un-patentable inventions, or by

eliminating baseless litigation of invalid patent claims.15 In addition, a

predictive approach can be used to assist a patent practitioner in crafting patent

claims that are less likely to be subjected to Alice rejections, thereby reducing

the number of applicant-examiner interactions and corresponding legal fees

during examination.16

In Section III, we describe our data collection methodology. Section III

lays out our process for generating a dataset for training our machine classifier.

In brief, we automatically download thousands of patent application file

histories, each of which is a record of the interaction between a patent examiner

and an applicant. From these file histories we extract office actions, each of

which is a written record of an examiner’s analysis and decision of a particular

application. We then process the extracted office actions to determine whether

the examiner has accepted or rejected the claims of the application under Alice.

Finally, we construct our dataset with the obtained information. Our dataset is

a table that associates, in each row, a patent claim with an indication of whether

the claim passes or fails the Alice test, as decided by a patent examiner.

In Section IV, we present results from an analysis of our dataset. Our

analysis identifies trends and subject matter areas that are disproportionately

subject to rejections under Alice. Our dataset shows that the subject matter areas

that contain many applications with Alice rejections include data processing,

business methods, games, educational methods, and speech processing. This

result is consistent with the focus of the Alice test on detecting claims that are

directed to abstract ideas, including concepts such as economic practices,

methods of organizing human activity, and mathematical relationships.

In Section V, we build a machine that is capable of predicting whether a

claim is likely to pass the Alice test. In this Section, we initially perform an

analysis that identifies particular words that are associated with eligibility or

ineligibility under Alice. The presence of such associations indicates that there

exist patterns that can be learned by way of machine learning. Next, we describe

the training, testing, and performance of a baseline classifier. Our classifiers are

trained in a supervised manner using as examples the thousands of subject matter

patentability decisions made by examiners at the Patent Office. We then

describe an improved classifier that uses an ensemble of multiple distinct

classifiers to improve upon the performance of our baseline classifier. We

15. See generally Sachs, supra note 3 (illustrating that under Alice, the courts have invalidated a patent

claim 72% of the time).

16. Sarah Garber, Avoiding Alice Rejections with Predictive Analytics, IPWATCHDOG (May 31, 2016),

http://www.ipwatchdog.com/2016/05/31/avoiding-alice-rejections-predictive-analytics/id=69519/.


conclude this Section with a brief outline of possible extensions, improvements,

and future work.

In Section VI, we describe a claim evaluation system. The system is a

Web-based application that takes a patent claim as input from a user, and

provides the text of the claim to a back-end classifier trained as described above.

The system provides the decision of the classifier as output to the user. It is

envisioned that a system such as this can be used by a patent practitioner to

provide improved Alice-related legal services at various stages of the lifecycle

of a patent, as discussed in Section II.

In Section VII, we utilize our machine classifier to quantitatively estimate

the impact of Alice on the universe of issued patents. While other studies have

tracked the actual impact of Alice in cases before the Federal Courts, our effort

is the first to use a machine classifier to quantitatively estimate the impact of

Alice on the entire body of issued patents.17 To obtain our estimate, we first

determine whether our classifier can be used as a proxy for the decision-making

of the Federal Courts. Since our classifier is trained based on decisions made by

examiners at the Patent Office, it is natural to ask whether the classifier

reasonably approximates the way that the courts apply the Alice test. To answer

this question, we evaluate the performance of our classifier on patent claims that

have been analyzed by the Court of Appeals for the Federal Circuit. The results

of this evaluation show that the outputs produced by our classifier are largely in

agreement with the decisions of the CAFC.18

Finally, we turn our classifier to the task of processing claims from a

random sample of 40,000 issued patents dating back to 1996. Extrapolating the

results obtained from our sample, we estimate that as many as 100,000 issued

patents have been invalidated due to the reduced scope of patent-eligible subject

matter under Alice v. CLS Bank.19 This large-scale invalidation of patent rights

represents a significant realignment of intellectual property rights at the stroke

of a judge’s pen.

B. Brief Review of the Alice Framework

The following procedure outlines the current test for evaluating a patent

claim for subject matter eligibility under 35 U.S.C. § 101. We will refer to this

test as the “Alice test,” although it was earlier articulated by the Supreme Court

in Mayo Collaborative Services v. Prometheus Laboratories, Inc.20

17. Jasper Tran, Two Years After Alice v. CLS Bank, 98 J. PAT. & TRADEMARK OFF. SOC’Y 354, 358

(2016) (using Federal Court decisions to estimate the impact of Alice on computer-related patents).

18. See infra Section VII.B (validating the performance of our classifier with respect to claims analyzed

by the Court of Appeals for the Federal Circuit).

19. Id.

20. See Mayo Collaborative Servs. v. Prometheus Labs., Inc., 132 S. Ct. 1289 (2012) (addressing a method

for administering a drug, and holding that a newly discovered law of nature is unpatentable and that the

application of that law is also normally unpatentable if the application merely relies on elements already known

in the art); Alice, 134 S. Ct. at 2355–60 (applying the Mayo analysis to claims to a computer system and method

for electronic escrow; holding the claims invalid because they were directed to an abstract idea, and did not

include sufficiently more to transform the abstract idea into a patent-eligible invention).


Step 1: Is the claim to a process, machine, manufacture, or composition of matter? If YES, proceed to Step 2A; if NO, the claim is not eligible subject matter under 35 U.S.C. § 101.21

Step 2A: Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea? If YES, proceed to step 2B; if NO, the claim qualifies as eligible subject matter under 35 U.S.C. § 101.22

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? If YES, the claim is eligible; if NO, the claim is ineligible.23

The test has two main parts.24 The first part of the test, in Step 1, asks

whether the claim is to a process, manufacture, machine, or composition of

matter.25 This is simply applying the plain text of Section 101 of the patent

statute to ask whether a patentable “thing” is being claimed.26 As a general

matter, this part of the test is easy to satisfy. If the claim recites something that

is recognizable as an apparatus/machine, process, manufacture, or composition

of matter, Step 1 of the test should be satisfied.27 If Step 1 of the test is not

satisfied, the claim is not eligible, end of analysis.28

The second part of the test attempts to identify claims that are directed to

judicial exceptions to the statutory categories.29 The second part of the test has

two subparts.30 Step 2A is designed to ferret out claims that, on their surface,

claim something that is patent eligible (e.g., a computer), but contain within

them a judicial exception.31 Step 2A asks whether the claim is directed to one

of the judicial exceptions.32 If not, then the claim qualifies as patent eligible.33

If so, however, Step 2B must be evaluated.34

The judicial exceptions in Step 2A include laws of nature, abstract ideas,

and natural phenomena.35 The category of abstract ideas can be broken down

into four subcategories: fundamental economic practices, ideas in and of

themselves, certain methods of organizing human activity, and mathematical

relationships and formulas.36 Fundamental economic practices include, for

21. Alice, 134 S. Ct. at 2355.

22. Id.

23. 2014 Interim Guidance on Patent Subject Matter Eligibility, 79 Fed. Reg. 74618, 74621 (Dec. 16,

2014) [hereinafter 2014 Guidance] (to be codified at 37 C.F.R. pt. 1).

24. Alice, 134 S. Ct. at 2355.

25. Id.

26. 35 U.S.C. § 101 (2018) (“Whoever invents or discovers any new and useful process, machine,

manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent

therefor.”).

27. Alice, 134 S. Ct. at 2355.

28. See, e.g., In re Ferguson, 558 F.3d 1359, 1364–66 (Fed. Cir. 2009) (holding that contractual

agreements and companies are not patentable subject matter); In re Nuijten, 500 F.3d 1346, 1357 (Fed. Cir.

2007) (holding that transitory signals are not patentable subject matter).

29. Alice, 134 S. Ct. at 2355.

30. Id.

31. Id.

32. Id.

33. Id.

34. Id.

35. Id.

36. Id. at 2355–56.


example, creating contractual relationships, hedging, or mitigating settlement

risk.37 Ideas in and of themselves include, for example, collecting and

comparing known information, diagnosing a condition by performing a test and

thinking about the results, and organizing information through mathematical

correlation.38 Methods of organizing human activity include, for example,

creating contractual relationships, hedging, mitigating settlement risk, or

managing a game of bingo.39 Mathematical relationships and formulas include,

for example, an algorithm for converting number formats, a formula for

computing alarm limits, or the Arrhenius equation.40

In Step 2B, the test asks whether the claims recite additional elements that

amount to “significantly more” than the judicial exception.41 In the computing

context, this part of the test is trying to catch claims that are merely applying an

abstract idea within a computing system, without adding significant additional

elements or limitations.42 Limitations that may be enough to qualify as

“significantly more” when recited in a claim with a judicial exception include,

for example: improvements to another technology or technical field;

improvements to the functioning of the computer itself; effecting a

transformation or reduction of a particular article to a different state or thing; or

adding unconventional steps that confine the claim to a particular useful

application.43

Limitations that have been found not to be enough to qualify as

“significantly more” when recited in a claim with a judicial exception include,

for example: adding the words “apply it” with the judicial exception; mere

instructions to implement an abstract idea on a computer; simply appending

well-understood, routine and conventional activities previously known to the

industry, specified at a high level of generality, to the judicial exception; or

adding insignificant extra-solution activity to the judicial exception.44

The Alice test is now being applied by federal agencies and courts at the

beginning and end of the patent lifecycle.45 With respect to the application phase

of a patent, shortly after the Alice decision, the Patent Office issued to the

examination corps instructions for implementing the Alice test.46 These

preliminary instructions were supplemented in December 2014 by the 2014

Guidance.47 As we will show in Section IV, below, the Patent Office has applied

37. See, e.g., Bilski v. Kappos, 561 U.S. 593 (2010) (mitigating settlement risk).

38. See, e.g., Digitech Image Tech., LLC v. Elec. for Imaging, Inc., 758 F.3d 1344 (Fed. Cir. 2014)

(organizing information through mathematical correlations).

39. See, e.g., buySAFE Inc., v. Google, Inc.,, 765 F.3d 1350, 1353–55 (Fed. Cir. 2014) (involving

contractual relationships).

40. See, e.g., Gottschalk v. Benson, 409 U.S. 63, 64–67 (1972) (involving algorithm for converting

number formats); Diamond v. Diehr, 450 U.S. 175, 177–181 (1981) (involving Arrhenius equation).

41. 2014 Guidance, supra note 23, at 74619.

42. Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 134 S. Ct. 2347, 2357–58 (2014).

43. 2014 Guidance, supra note 23, at 74624 (citations omitted).

44. Id.

45. Id. at 74631.

46. Preliminary Examination Instructions, supra note 8.

47. 2014 Guidance, supra note 23.


this test widely, with significant numbers of rejections appearing in specific

subject matter areas.48

With respect to the enforcement phase of the patent lifecycle, the Federal

Courts have been actively applying the Alice test to analyze the validity of patent

claims in the litigation context.49 As of June 2016, over 500 patents have been

challenged under Alice, with a resulting invalidation rate exceeding 65%.50 The

Court of Appeals for the Federal Circuit has itself heard over fifty appeals that

have raised the Alice issue.51

Note that when we speak of the “Alice test” in the context of the Patent

Office we include the entire body of case law that has developed in the wake of

the Mayo and Alice decisions.52 The cases following Alice have refined and

clarified the Alice two-step analysis with respect to particular fact contexts.53

The Patent Office has made considerable effort to keep abreast of these decisions

and to train the examining corps as to their import.54 To a large degree then, the

Patent Office embodies the current state of subject matter eligibility law.55 And

while this law is never static, it is also not changing so quickly as to undermine

one of the central premises of this article, which is that the Patent Office can be

used as a source of examples of a decision maker (in this case, a sort of “hive

mind” comprising many thousands of individual examiners) applying a legal

rule to determine whether a patent claim is subject matter eligible.56 Assuming

that the application of the rule is not completely random, as we will show in

Section IV, then it should be possible to train a machine to learn the rule (or its

approximation) based on our collection of examples.

II. RENDERING LEGAL SERVICES IN THE SHADOW OF ALICE

In this Section, we motivate a computer-assisted approach for rendering

legal advice in the context of Alice. Alice creates a new patentability question

that must be answered before and during the preparation, prosecution, and

48. See Tran, supra note 17, at 357 (discussing the increase in patent rejections resulting from the Alice

test).

49. Id. at 358.

50. Id.

51. Chart of Subject Matter Eligibility Court Decisions, USPTO, https://www.uspto.gov/sites/default/

files/documents/ieg-sme_crt_dec.xlsx (updated July 31, 2017).

52. See, e.g., Enfish LLC v. Microsoft Corp., 822 F.3d 1327 (Fed. Cir. 2016); Bascom Global Internet

Servs., Inc. v. AT&T Mobility LLC, 827 F.3d 1341 (Fed Cir. 2016); McRO, Inc. v. Bandai Namco Games Am.

Inc., 837 F.3d 1299 (Fed. Cir. 2016); Amdocs (Israel) Ltd. v. Openet Telecom, Inc., 841 F.3d 1288 (Fed. Cir.

2016); DDR Holdings, LLC v. Hotels.com, L.P., 773 F.3d 1245 (Fed. Cir. 2014); Ultramercial, Inc. v. Hulu,

LLC, 772 F.3d 709 (Fed. Cir. 2014).

53. See, e.g., McRO, Inc., 837 F.3d at 1303 (applying the Alice test to a patent relating to a method for

automation of 3-D animation of facial expressions).

54. See Recent Subject Matter Eligibility Decisions, USPTO (May 19, 2016), https://www.uspto.gov/

sites/default/files/documents/ieg-may-2016_enfish_memo.pdf (summarizing post-Alice Supreme Court

decisions); see also Recent Subject Matter Eligibility Decisions, USPTO (Nov. 2, 2016), https://www.uspto.gov/

sites/default/files/documents/McRo-Bascom-Memo.pdf [hereinafter Recent Subject Matter Eligibility

Decisions] (illustrating the Patent Office’s discussion of decisions of the Court of Appeals for the Federal Circuit,

including Enfish, McRO, and Bascom).

55. See Recent Subject Matter Eligibility Decisions, supra note 54 (discussing the Patent Office policy

mirroring eligibility law).

56. See id. (elaborating on how to predict the Patent Office’s application of the Alice test).


enforcement of a patent.57 Increased access to data allows us to implement a

data-driven, predictive computer system for efficiently answering the Alice

patentability question, possibly yielding economic efficiencies.

Alice casts a shadow over virtually every phase of the lifecycle of a patent,

including preparation, prosecution, and enforcement.58 Inventors want to

understand as an initial matter whether to even attempt to obtain patent

protection for their inventions. The cost to prepare and file a patent application

of moderate complexity can easily exceed $10,000, and inventors would like to

know whether it is worth it even to begin such an undertaking.59

In addition, there are hundreds of thousands of “in flight” patent

applications, all prepared and filed prior to the Alice decision.60 These

applications likely do not include the necessary subject matter or level of detail

that may be required to overcome a current or impending Alice rejection.61 These

applications may not contain evidence of how the invention improves the

operation of a computing system or other technology.62 In such cases, patent

applicants want to know whether it is even worth continuing the fight, given that

they must pay thousands of dollars for every meaningful interaction with a patent

examiner.63

In the enforcement phase of the patent lifecycle, litigants want to know the

likelihood that an asserted patent will be invalidated under Alice. Both parties

to a suit rely on such information when deciding whether to settle or continue

towards trial.64 For plaintiffs, the increased likelihood of fee shifting raises the

stakes even further.65 From an economic welfare perspective, providing

patentees with accurate information regarding the likelihood of invalidation

should result in a reduction in the inefficient allocation of resources, by

shortening or reducing the number of lawsuits.

57. Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 134 S. Ct. 2347, 2355 (2014).

58. See generally Tran, supra note 17 (discussing the recent implications of the Alice decision).

59. 2015 Report of the Economic Survey, AM. INTELL. PROP. L. ASSOC., I-85, https://www.aipla.org/

learningcenter/library/books/econsurvey/2015EconomicSurvey/Pages/ default.aspx (last visited Mar. 13, 2018)

[hereinafter 2015 Report] (the median legal fee to draft a relatively complex electrical/computer patent

application is $10,000).

60. See Tran, supra note 17, at 358 (discussing the significant decrease in patent grants after Alice was

handed down, this is likely due to the fact that the standard shift invalidated the patents next in the queue); see

U.S. Patent And Trademark Office Patent Technology Monitoring Team, U.S. Patent Statistics Chart Calendar

Years 1963–2015, USPTO.GOV, https://www.uspto.gov/web/offices/ac/ido/oeip/taf/us_stat.htm (last visited

Mar. 13, 2018) (showing that 615,243 patent applications were filed in 2014 and 629,647 patent applications

were filed in 2015).

61. Tran, supra note 17, at 357–58 (“Between July 1 and August 15, 2014, there were 830 patent

applications related [to] computer implemented inventions withdrawn from the U.S. Patent and Trademark

Office.”).

62. See generally Enfish LLC v. Microsoft Corp., 822 F.3d 1327, 1335–36 (Fed. Cir. 2016) (requiring

analysis of improvements to functioning of the computer or other related technology as a part of the Alice test);

Tran, supra note 17, at 358.

63. 2015 Report, supra note 59, at I-86 (stating that the median legal fee to prepare a response to an

examiner’s rejection for a relatively complex electrical/computer application is $3,000).

64. Id.

65. Octane Fitness, LLC v. Icon Health & Fitness, Inc., 134 S. Ct. 1749, 1753–54 (2014); see e.g., Edekka

LLC v. 3Balls.com, Inc., 2015 U.S. Dist. 168610, at *19–20 (E.D. Tex. Dec. 17, 2015) (awarding attorney fees

under 35 U.S.C. § 285 in a case dismissed for claims found invalid by Judge Gilstrap under Alice).


A. Intuition-Based Legal Services

Historically, attorneys have provided the above-described guidance by

applying intuition, folk wisdom, heuristics, and their personal and shared

historical experience. For example, in the context of patent prosecution

generally, the field is rife with (often conflicting) guiding principles,66 such as:

Make every argument you possibly can

To advance prosecution, amending claims is better than arguing

Keep argument to a minimum, for fear of creating prosecution history

estoppel or disclaimer

File appeals early and often

Interviewing the examiner expedites examination

Interviewing the examiner is a waste of time and money

Use prioritized examination—you’ll get a patent in twelve months!67

You are playing a lottery: if your case is assigned to a bad examiner,

give up hope!

Unfortunately, the above approaches are not necessarily effective or

applicable in all contexts. For example, while some approaches may have

worked in the past (e.g., during the first years of practice when the attorney

received her training), they may no longer be effective, given changes in Patent

Office procedures and training, changes in the law, and so on.68

Nor do the above approaches necessarily consider client goals. Different

clients may desire different outcomes, depending on their market, funding needs,

budget, and the like. Example client requirements include short prosecution

time (e.g., get a patent as quickly as possible), long prosecution time (e.g., delay

prosecution during clinical trials), obtaining broad claims, minimizing the

number of office actions (because each office action costs the client money), or

the like.69 It is clear that any one maxim or approach to patent prosecution is

not going to optimize the outcome for every client in every possible instance.

While a truly optimal outcome may not be possible, in view of the randomness

and variability in the examination system, it is undoubtedly possible to do better.

In the following Subsection, we assert that a data-driven approach can yield

improved outcomes and economic efficiencies for the client.

66. The following list is based on the Author’s personal experience as a patent prosecutor. At one time

or another, the Author has worked with a client, supervisor, or colleague who has insisted on following one or

more of the presented guidelines.

67. USPTO, Prioritized Examination, 76 FR 59050 (Sept. 23, 2011) (promising to provide a final

disposition for a patent application within one year).

68. E.g., The America Invents Act, Public L. No. 112-29, Effective September 16, 2012 (making

substantial changes to patent law, including moving from a first-to-invent to a first-to-file priority system); see

also, Summary of the America Invents Act, AM. INTELLECTUAL PROP. L. ASSOC., http://www.aipla.org/advocacy/

congress/aia/Pages/summary.aspx (last visited Mar. 19, 2018) (summarizing the changes made by the America

Invents Act).

69. See 2015 Report, supra note 59, at I-86 (discussing the expensive stages of filing patents).


B. Data-Driven Patent Legal Services

A data-driven approach promises to address at least some of the

shortcomings associated with the traditional approach to providing patent-

related legal services. As a simple example, many clients are concerned with

the number of office actions required to obtain a patent.70 This is because each

office action may cost the client around $3,000 in attorney fees to formulate a

response.71 For large corporate clients, with portfolios numbering in the

thousands of yearly applications, reducing the average number of office actions

(even by a fractional amount on average) can yield significant savings in yearly

fees to outside counsel.72 For small clients and individual inventors, one less

office action may be the difference between pushing forward and abandoning a

case. Is it possible to use data about the functioning of the patent office to better

address the needs of these different types of clients?

In the academic context, prior studies considering patent-related data have

focused largely on understanding or measuring patent breadth, quality, and/or

value using empirical patent features. One body of literature uses patent citation

counts and other features (e.g., claim count, classification identifiers) of an

issued patent to attempt to determine patent value.73 Others have studied the

relationship between patent scope and firm value.74 Other empirical work has

analyzed prosecution-related data in order to determine patent quality.75

For this project, we are more interested in predicting how decision makers

(e.g., judges or patent examiners) will evaluate patent claims. We make such

predictions based on the prior behaviors and actions of those decision makers.

Fortunately, it is now becoming increasingly possible to cheaply obtain and

analyze large quantities of data about the behaviors of patent examiners and

judges.76

In the patent prosecution context, the Patent Office hosts the PAIR (Patent

Application Information Retrieval) system, which provides the “file wrapper”

for every published application or issued patent.77 The patent file wrapper

70. See, e.g., id. (excessive legal fees for patent applications are burdensome to clients).

71. E.g., id. (the median legal fee to prepare an amendment/argument for a relatively complex

electrical/computer application is $3,000).

72. E.g., 25 Years of Patent Leadership, IBM RES., https://www.research.ibm.com/patents/ (last visited

Mar. 19, 2018) (describing how IBM received over 9,000 patents in 2017; saving even $10,00 per patent would

yield significant savings).

73. See, e.g., John R. Allison et al., Valuable Patents, 92 GEO. L.J. 435 (2004); Nathan Falk & Kenneth

Train, Patent Valuation with Forecasts of Forward Citations, 12 J. OF BUS. VALUATION AND ECON. LOSS

ANALYSIS 101 (2017); Mark P. Carpenter et al., Citation Rates to Technologically Important Patents, 3 WORLD

PAT. INFO. 160 (1981).

74. See, e.g., Joshua Lerner, The Importance of Patent Scope: An Empirical Analysis, 25.2 RAND J. OF

ECON. 319 (1994) (describing that patent classification is used as a proxy for scope).

75. See, e.g., Ronald J. Mann & Marian Underweiser, A New Look at Patent Quality: Relating Patent

Prosecution to Validity, 9 J. EMPIRICAL LEGAL STUD. 1 (2012) (discussing two hand-collected data sets to

analyze patent quality).

76. Alumnus Winship Creates Juristat to Mine Patent Prosecution Data for Clients, WASH. U. L.,

http://law.wustl.edu/m/content.aspx?id=10059 (last visited Jan. 28, 2018).

77. Patent Application Information Retrieval, U.S. PAT. & TRADEMARK OFFICE,

http://portal.uspto.gov/pair/PublicPair (last visited Mar. 13, 2018). In addition, bulk data downloads are

available at: USPTO Bulk Downloads: Patents, GOOGLE, https://www.google.com/googlebooks/uspto-


includes every document, starting with the initial application filing, filed by the

applicant or examiner during prosecution of a given patent application.78

A number of commercial entities provide services that track and analyze

prosecution-related data.79 These services provide reports that summarize

examiner or group-specific behaviors and trends within the Patent Office,

including allowance rates, appeal dispositions, timing information, and the

like.80 Such information can be used to tailor prosecution techniques to a

specific examiner or examining group. For example, if the examiner assigned

to a particular application has, based on his work on other cases, shown himself

to be stubborn (e.g., as evidenced by a high appeal rate, high number of office

actions per allowance, or the like), then the client may elect to appeal the case

earlier than usual, given that further interaction with the examiner may be of

limited utility.

In the context of Alice, we can learn many things from patent prosecution

data. As one example, we can learn which art units or subject matter classes are

subject to the most Alice rejections.81 While this is useful, it is not always known

a priori how a new application will be classified by the Patent Office. As

another example, we can learn which examiners are particularly prone to issue

Alice rejections and, perhaps more interestingly, how likely an applicant is to

overcome that rejection based on the examiner’s decisions in other cases.

Dissecting the data even further, we may even be able to learn what types of

arguments are successful in overcoming Alice rejections.

C. Predicting Subject Matter Rejections Yields Economic Efficiencies

While the above types of information may be valuable to an applicant in

the midst of examination, it is not so useful in the pre-application or post-

issuance phases of the lifecycle of a typical patent.82 A client wishing to file an

application for an invention will want to know how likely he is to encounter an

Alice rejection. As another example, a client with an issued patent will want to

know how likely it is that her patent will be invalidated by a court.

In view of the above, the goal of this work is to predict whether a particular

patent claim will be considered valid or invalid under Alice, based on patent

prosecution-related data obtained from the Patent Office. As described in detail

below, such a prediction can be made based on relationships between specific

patents.html (last visited Mar. 13, 2018) [hereinafter USPTO Bulk Downloads]; USPTO Data Sets, REED TECH,

http://patents.reedtech.com/index.php (last visited Mar. 13, 2018).

78. MANUAL OF PATENT EXAMINING PROCEDURE § 719 (9th ed. rev. 7, Nov. 2015); 37 C.F.R. § 1.2

(2015).

79. See e.g., JURISTAT, https://www.juristat.com/ (last visited Mar. 13, 2018); LexisNexis PatentAdvisor,

REED TECH, http://www.reedtech.com/products-services/intellectual-property-solutions/lexisnexis-

patentadvisor (last visited Mar. 13, 2018).

80. Product Primers, JURISTAT, https://resources.juristat.com/product-primers/ (last visited Mar. 13,

2018).

81. See infra Section IV and Plots 1–3.

82. See Tara Klamrowski, How to Engineer Your Application to Avoid Alice Rejections, REED TECH (Oct.

19, 2017), http://knowledge.reedtech.com/intellectual-property-all-posts/how-to-engineer-your-application-to-

avoid-alice-rejections (describing other methods to avoid an Alice rejection).


claim terms and the presence or absence of corresponding subject matter

rejections issued by the Patent Office.83

In related work, Aashish Karkhanis and Jenna Parenti have identified

correlations between specific terms in a patent claim with patent eligibility.84

Our work differs from and expands upon that of Karkhanis and Parenti in a

number of ways. First, we rely on the decisions made by patent examiners rather

than judges.85 The number of claims that have been evaluated for eligibility

under Alice in the Patent Office is several orders of magnitude larger than the

number of claims that have been similarly evaluated by the courts.86 This means

that we have significantly more data to utilize for analysis and machine learning

efforts. Second, we have developed a computer program that mechanizes a

human decision-making process by exploiting relationships between claim

terms and validity to classify claims as valid or invalid. Third, we use our

mechanism to estimate the impact on the body of patents issued prior to the Alice

decision.

Predicting potential Alice-based validity issues provides benefits in every

phase of the patent lifecycle.87 For example, such predictions can be employed

to determine whether to even file a patent application for a given invention.88 If

it is possible, a priori, to cheaply determine whether a particular invention is

directed to patent ineligible subject matter, then a client may be saved tens of

thousands of dollars in legal fees.89 While legal fees spent in pursuit of an

invalid patent will surely enrich the patent attorney who receives them, such fees

represent economic waste. Wasted legal fees are resources that could be more

productively and efficiently employed in some other context.

In patent preparation or prosecution, predicting subject matter eligibility

issues can help attorneys better craft or amend claims. For a given patent claim,

such a prediction may serve as an “early warning” sign that can help put the

client and attorney on notice that a claim as drafted may be rejected by the Patent

Office on subject matter grounds.90 The claim drafter can then iteratively

modify the claim to settle on more detailed claim language that may not suffer

from the abstractness issues that trigger a typical Alice rejection.91 Iteratively

obtaining feedback from a machine is much cheaper than doing so with a patent

examiner.92 As noted above, each interaction with an examiner results in

83. See infra Section V.

84. Aashish R. Karkhanis & Jenna L. Parenti, Toward an Automated First Impression on Patent Claim

Validity: Algorithmically Associating Claim Language with Specific Rules of Law, 19 STAN. TECH. L. REV. 205,

215 (2016).

85. Id.

86. See Tran, supra note 17, at 358 (indicating that 568 patents have been challenged in the courts under

Alice as of June 2016).

87. Karkhanis & Parenti, supra note 84, at 212.

88. Dugan, supra note 11 (an example system that provides such predictions).

89. 2015 Report, supra note 59 (demonstrating that the median legal fee to draft a relatively complex

electrical/computer patent application is $10,000).


91. E.g., Dugan, supra note 11 (allowing iterative modification of patent claims).

92. See, e.g., Karkhanis & Parenti, supra note 84; How Much Does a Patent Cost: Everything You Need

to Know, UPCOUNSEL, https://www.upcounsel.com/how-much-does-a-patent-cost (last visited Mar. 13, 2018)

(describing the bottom line costs to filing a patent).


thousands of dollars in legal fees to the client.93 Reducing the number of

interactions with the examiner yields considerable savings to the client and the

examining corps, and thus increases economic efficiency.94

Provided that we can use the Patent Office as a proxy for the decision

making of the Federal Courts,95 our predictive techniques can be used to identify

weaknesses in asserted claims during the enforcement of a patent. For example,

during pre-suit investigation, a patentee could predict whether a given patent

claim is likely to be held invalid by the court.96 Providing patentees with such

pre-suit information, coupled with the threat of fee shifting under Octane Fitness,97 may result in a sharp decrease in baseless patent litigation.

Note that we are not claiming that our predictive tool will reduce the

amount of effort required to prepare patent claims. We instead assert that in at

least some of those cases where the invention is clearly directed to unpatentable

subject matter, no patent claims will be prepared at all, resulting in savings to

the client. In cases where the invention is on the borderline of patentability, the

patent attorney may in fact spend more time crafting claims that can avoid a

subject matter rejection.98 Although this will result in higher up-front costs to

the client,99 the client will typically save in the long run, as the number of

interactions with the patent office will be reduced.100

Nor does our system test claims for every possible basis of invalidity.

Patent claims may of course be invalid for many reasons, including for a lack of

utility, anticipation or obviousness in view of the prior art, indefiniteness, or a

lack of written description.101 Instead, our system only determines whether a

given claim is directed to patent-eligible subject matter under 35 U.S.C. § 101.

While automatically determining validity under other statutory grounds is an

open area of research, it is not addressed here.

In conclusion, we have presented a case for predictive technologies, such

as our machine classifier, which can assist patent practitioners in efficiently

93. E.g., 2015 Report, supra note 59, at I-86 (demonstrating that the median legal fee to prepare a response

to an examiner’s rejection for a relatively complex electrical/computer application is $3,000); see also USPTO

Fee Schedule, U.S. PAT. & TRADEMARK OFF., https://www.uspto.gov/learning-and-resources/fees-and-

payment/uspto-fee-schedule#exam (last visited Mar. 13, 2018) (providing information and fee rates for

examination services).

94. Id.

95. See infra Section VII.B (validating the performance of our classifier with respect to claims analyzed

by the Court of Appeals for the Federal Circuit).


97. See generally, Octane Fitness, LLC v. Icon Health & Fitness, Inc., 134 S. Ct. 1749 (2014) (discussing

a standard for fee-shifting and attorney fees arrangements); Edekka LLC v. 3Balls.com, Inc., No. 2:15-CV-541-

JRG, 2015 WL 9225038, at *1 (E.D. Tex. Dec. 17, 2015).

98. See ROBIN JACOB ET AL., GUIDEBOOK TO INTELLECTUAL PROPERTY (2014), https://books.google.com/

books?id=FYvqAwAAQBAJ&h (“Borderline cases in this area will always be improved by ingenious framing

of claims, another reason why the services of an experienced patent attorney can be so valuable.”).

99. See Gene Quinn, The Cost of Obtaining a Patent in the US, IPWATCHDOG (Apr. 4, 2015),

http://www.ipwatchdog.com/2015/04/04/the-cost-of-obtaining-a-patent-in-the-us/id=56485/ (explaining the

more complex the invention, the higher attorney’s fee for patent prosecution).

100. See 2015 Report, supra note 59, at I-86 (demonstrating that the median legal fee to prepare a response

to an examiner’s rejection for a relatively complex electrical/computer application is $3,000); USPTO Fee

Schedule, supra note 93 (providing information and fee rates for examination services).

101. 35 U.S.C. §§ 101, 102, 103, 112 (2012) (explaining subject matter and utility, anticipation,

obviousness, and definiteness and written description).


analyzing claims for compliance with Alice. Such analysis can yield significant

economic efficiencies at nearly every stage of the patent lifecycle, including

patent application preparation, prosecution, and enforcement. In the following

Section, we provide an overview of the data collection method that we use to

obtain data for training our machine classifier.

III. DATA COLLECTION METHODOLOGY

In this Section, we describe our data collection methodology, and more

specifically our process for creating a dataset for training our machine classifier.

In brief, we obtain thousands of office actions issued by the Patent Office, each

of which is a written record of an examiner’s analysis and decision of a particular

patent application. We then process the office actions to determine whether the

examiner has accepted or rejected the pending claims of the application under

Alice. We then create a table that associates, in each row, a patent claim with an

indication of whether the claim passes or fails the Alice test.

Our method relies on, as raw material, those patent applications that have

been evaluated by the Patent Office for subject-matter eligibility under Alice. In

the Patent Office, each patent application is evaluated by a patent examiner, who

determines whether or not to allow the application.102 Under principles of

“compact prosecution,” the examiner is expected to analyze the claims for

compliance with every statutory requirement for patentability.103 The core

statutory requirements include those of patent-eligible subject matter, novelty,

and non-obviousness.104 If the examiner determines not to allow an application,

the examiner communicates the rejection to an applicant by way of an “office

action.”105 An office action is a writing that describes the legal bases and

corresponding factual findings supporting the rejection of one or more claims.106

Our approach inspects office actions issued after the Alice decision in order

to find examples of patent eligible and ineligible claims. As will be described

in detail below, these examples are employed in a supervised machine learning

application that trains a classifier to recognize eligible and ineligible claims. If

the office action contains an Alice rejection, then the rejected claim is clearly an

example of a patent-ineligible claim.107 On the other hand, if the office action

does not contain an Alice rejection, then the claims of the application provide

examples of patent-eligible claims, because we assume that the examiner has

evaluated the claims with respect to all of the requirements of patentability,

102. 35 U.S.C. § 131 (2012); 37 C.F.R. § 1.104 (2017).

103. 35 U.S.C. § 132 (2012); MANUAL OF PATENT EXAMINING PROCEDURE § 2103 (9th ed. rev. July 2015)

(“Under the principles of compact prosecution, each claim should be reviewed for compliance with every

statutory requirement for patentability in the initial review of the application, even if one or more claims are

found to be deficient with respect to some statutory requirement.”).

104. 35 U.S.C. § 101, 102, 103 (2012).

105. 35 U.S.C. § 132 (2012); 37 C.F.R. § 1.104 (2017); MANUAL OF PATENT EXAMINING PROCEDURE,

supra note 103, at § 706.

106. 35 U.S.C. § 132 (2012); 37 C.F.R. § 1.104 (2017); MANUAL OF PATENT EXAMINING PROCEDURE,

supra note 103, at § 706.

107. Alice Corp. v. CLS Bank Int’l, 134 S. Ct. 2347, 2354 (2014) (ruling on patent eligibility).


including Alice compliance.108 If no Alice rejection is present in an office action,

then the examiner must have determined that the claims were directed to eligible

subject matter.109

The goal, therefore, is to find office actions issued after the time at which

the Patent Office at large began examining cases for compliance with the rule of

Alice. Alice was decided on June 19, 2014.110 The Patent Office issued

preliminary instructions for subject-matter eligibility examination on June 25,

2014.111 These instructions were supplemented and formalized in the 2014

Guidance, issued December 16, 2014.112 In view of this regulatory history of

the Patent Office, and partly based on personal experience receiving Alice

rejections, we selected October 2014 as the relevant cutoff date.113 Any office

action issued after the cutoff date therefore represents an evaluation of a patent

application under Alice.

The following outlines the steps of our data collection process. As a

background step, we created a patent document corpus. The patent document

corpus is based on full text data provided by the Patent Office of every patent

issued since 1996 and application published114 between 2001 and the present.115

We store some of the patent and application data in a full text index.116 The

index includes fields for document type (e.g., application or patent), dates (e.g., filing date, publication date, issue date), document identifiers (e.g., application

number, publication number, patent number), technical classification, title,

abstract, claim text, and the like.117 At the time of writing, approximately 4.8

million published applications and 4.0 million patents have been indexed. The

use of the patent document corpus will be described further below.

Figure 1, below, is a generalized flow diagram that illustrates data

collection operations performed to obtain office actions for analysis.

108. 2014 Guidance, supra note 23 (requiring consideration of Alice during examination).

109. Id.

110. Alice, 134 S. Ct. at 2347.


112. 2014 Guidance, supra note 23.

113. The data analysis presented in infra Plot 1 supports our decision to use October 2014 as the cutoff

date.

114. 35 U.S.C. § 122 (2012); 37 C.F.R. § 1.104 (2017).

115. USPTO Bulk Downloads: Patents, supra note 77; United States Patent and Trademark Office Bulk

Data Downloads, REED TECH, http://patents.reedtech.com/index.php (last visited Mar. 13, 2018). USPTO Bulk

Data includes Patent Grant Full Text Data and Patent Application Full Text Data. The data is hosted by the

USPTO and third party vendors, including Google USPTO Bulk Downloads and Reed Tech USPTO Data Sets.

116. See APACHE SOLR, http://lucene.apache.org/solr/ (last visited Mar. 13, 2018), to view the Apache

Software that was used for text indexing in our data collection process.

117. See Patent Application Full Text and Image Database, U.S. PATENT & TRADEMARK OFFICE,

http://appft.uspto.gov/netahtml/PTO/ search-adv.html (last visited Mar. 13, 2018) (listing identifiers for patent

application).


Figure 1: Data Collection Process

Initially, we collect the file histories (“file wrappers”) for a randomly

selected set of application numbers.118 Each file history is a ZIP archive file that

includes multiple documents, including the patent application as filed, office

actions, notices, information disclosure statements, applicant responses, claim

amendments, and the like. At the time of this writing, over 180,000 file histories

have been downloaded.

As discussed above, we are interested in finding office actions issued by

the Patent Office on or after October 2014. The Patent Office uses a naming

convention to identify the files within a file history.119 For example, the file

12972753-2013-04-01-00005-CTFR.pdf is a Final Rejection (as indicated by the

document code CTFR) dated April 1, 2013, for application number

12/972,753.120 When a file history ZIP file is unpacked, the document code can

be used to identify relevant documents, which in this case are Non-Final and

Final Rejections.121

At the time of this writing, about 90,000 office actions have been obtained.

The number of office actions is smaller than the number of total file histories

because many of the file histories are associated with applications that have yet

to be examined, and therefore do not contain any office actions.122

118. See APACHE SOLR, supra note 116 (presenting Apache service used for data collection).

119. See Patent Application Information Retrieval, U.S. PAT. & TRADEMARK OFF., https://portal.uspto.gov/

pair/PublicPair (last visited Mar. 13, 2018) (providing access to patent application information with explanation

of document codes).

120. Id.

121. Identified by the document codes CTNF and CTFR, respectively.

122. See Patent Process Overview, U.S. PAT. & TRADEMARK OFF., https://www.uspto.gov/patents-getting-

started/patent-process-overview (last visited Mar. 13, 2018) (showing the timeline of patent prosecution); 35

U.S.C. § 122 (2012).


Each office action in the file history is a PDF file that includes TIFF images

of the pages of the document produced by the examiner.123 As each page of an

office action is represented as a TIFF image, each page of the office action must

be run through an optical character recognition (OCR) module to convert the

PDF file to a text file.124

Once an office action is converted to text, it can be searched for strings that

are associated with Alice rejections. An example Alice rejection found in an

Office Action issued in U.S. Patent Application No. 14/543,715 reads as

follows:

Claim Rejections - 35 USC § 101

35 U.S.C. § 101 reads as follows: “Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.”

Claim 1, 9 & 17 are rejected under 35 U.S.C. § 101 because the claimed invention is not directed to patent eligible subject matter. Based upon consideration of all of the relevant factors with respect to the claim as a whole, claim(s) 1, 9 & 17 are determined to be directed to an abstract idea. . . . The claim(s) are directed to the abstract idea of organizing human activities utilizing well known and understood communication devices and components to request and receive multimedia content by a customer.125

Patent examiners tend to rely on form paragraphs provided by the Patent

Office when making or introducing a rejection, so, fortunately, there is a high

level of consistency across office actions.126 Text strings such as the following

were used to identify actions that contained an Alice rejection: “35 USC § 101,”

“abstract idea,” “natural phenomenon,” and the like.127

From the full set of obtained office actions, we selected those issued during

or after October 2014, a total of about 32,000 office actions. We then analyzed

each office action in this subset to determine whether it contained an Alice

rejection. If an office action did contain an Alice rejection, then the

corresponding application was tagged as including a patent-ineligible claim

(sometimes also termed REJECT); conversely, if an office action did not contain

an Alice rejection, then the corresponding application was tagged as including

123. See Patent Application Information Retrieval, supra note 119 (providing access to patent application

history in the format of PDF files).

124. See Tesseract-OCR, GITHUB, https://github.com/tesseract-ocr (last visited Mar. 13, 2018) (providing

an optical character recognition program).

125. Office Action dated Dec. 17, 2014 U.S. Patent Application No. 14/543,715.

126. See, e.g., MANUAL OF PATENT EXAMINING PROCEDURE §706.03(a), Form Paragraph 7.05.015, (8th

ed. Rev. 7, Sept. 2008) (explaining that “the claimed invention is directed to a judicial exception (i.e., a law of

nature, a natural phenomenon, or an abstract idea) without significantly more. Claim(s) [1] is/are directed to

[2]. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than

the judicial exception because [3].”).

127. Specifically, we identify Alice rejections by searching for the strings “abstract idea” and “natural

phenom*”. While this technique is efficient, it does result in the rare false positive, such as when an examiner

writes, “[t]he claims are not directed to an abstract idea.”


eligible claims (or ACCEPT).128 Based on the 32,000 office actions issued

during the relevant period, about 26,000 applications have been identified as

eligible, and 3,000 as ineligible.

The next step of the process is to identify the claim that is subject to the

Alice rejection. Typically, the examiner will identify the claims rejected under

a particular statutory provision. For example, the examiner may write “Claims

1, 3–5, and 17–20 are rejected under 35 § USC 101. . . .” Ideally, we would

parse this sentence to identify the exact claims rejected under Alice. However,

we made the simplifying assumption that, at a minimum, the first independent

claim (typically claim 1) was being rejected under Alice.129

We make another simplifying assumption to find the actual claim text

rejected under Alice. In particular, we pull the text of the first independent claim

(“claim 1”) of the published patent application stored in the patent document

corpus described above. Note that this claim is typically the claim that is filed

with the original patent application, although it is not necessarily the claim that

is being examined when the examiner makes the Alice rejection. For example,

the applicant may have amended claim 1 at some time after filing and prior to

the particular office action that includes the Alice rejection. However, it is

unlikely that the claim 1 pending at the time of the Alice rejection is markedly

different from the originally filed claim 1. If anything, the rejected claim is

likely to be more concrete and less abstract due to further amendments that have

been made during examination.

We use claim 1 from the published application because it can be efficiently

and accurately obtained. Each patent file history contains documents that reflect

the amendments made to the claims by the applicant.130 It is therefore

technically possible to OCR those documents to determine the text of the claims

pending at the time of an Alice rejection. However, because applicants reflect

amendments to the claims by using strikethrough and underlining, these text

features greatly reduce the accuracy of our OCR system. In the end, we decided

to rely on the exact claim text available from the patent document corpus instead

of the degraded OCR output of the actual claim subjected to the Alice rejection.

Further work will show whether this assumption had a significant impact on the

results presented here.

For applications that were examined during the relevant time period but

that were not subject to an Alice rejection (that is, they “passed” the test), we

128. Note that it is possible for an application to be labeled both REJECT and ACCEPT, due to a first

office action that includes an Alice rejection and a second office action that does not include an Alice rejection.

129. It should never be the case that a dependent claim will be rejected under Alice if its corresponding

independent claim is not rejected under Alice, as dependent claims are strictly narrower than their parent claims.

Moreover, based on the author’s personal experience as a patent prosecutor, it is very rare that an examiner will,

within one set of claims, allow one independent claim under Alice while rejecting another. Typically, all of the

claims rise and fall together under Alice, since the analysis is intentionally designed to ferret out abstractions

even when they are claimed in the more mechanical claim formats (e.g., apparatus vs. method). Our simplifying

assumption was supported via a manual spot check of over a hundred cases: the examiner reached different

conclusions for different independent claims in only a handful of applications.

130. 37 C.F.R. § 1.121 (2003); see Quick Tip: Examining the File History, ARTICLE ONE PARTNERS,

https://www.articleonepartners.com/blog/quick-tip-examining-the-file-history/ (last visited Mar. 13, 2018)

(defining what is included in the file history of a patent application).


prefer to use claim 1 from the patent (if any) that issued on the corresponding

application. Claim 1 from the issued patent is preferred, because it reflects the

claim in final form, after it has been evaluated and passed all of the relevant

statutory requirements, including subject matter eligibility under Alice, based on

the existence of an office action issued after October 2014. If there is no issued

patent, such as because the applicant and examiner are still working through

issues of novelty or non-obviousness, we currently elect not to use claim 1 from

the published patent application. For machine learning purposes, this has

resulted in slightly improved performance, possibly because the claims

evaluated for Alice compliance were actually markedly different than the

published claim 1.

Note that the above-described process was iteratively and adaptively

performed. From an initial random sample of patent applications, it was possible

to identify those classes where Alice rejections were common. Then, we

preferentially obtained additional applications from those Alice rejection-rich

classes, in order to increase the likelihood of obtaining office actions that

contained Alice rejections.

Once we perform the above-described data collection, the identified claims

are stored in a table. Each row of the table includes an application number, a

patent classification identifier, an Alice-eligibility indicator (e.g., a tag of

“accept” or “reject”) denoting whether the claim was accepted or rejected by the

examiner, and the text of the claim itself. Patent classification is used by the

Patent Office to group patents and applications by subject matter area.131 The

patent classification identifier is a Cooperative Patent Classification (CPC)

scheme identifier that is obtained from the patent document corpus, as assigned

by the Patent Office to each patent application and issued patent.132 Retaining

the patent classification allows us to break down analysis results by subject

matter area.

In this Section we have described our process for creating our dataset. In

Section IV, next, we present an analysis of the obtained data. In Section V,

below, we use the obtained data to train a machine classifier to determine

whether or not a claim is directed to patent eligible subject matter.

IV. DATA ANALYSIS RESULTS

In this Section, we present results from a data analysis of office actions

issued by the Patent Office. Here, we are trying to answer the following

questions. First, can we “see” the impact of Alice in the actions of the Patent

Office? Put another way, do we see an increase in the number of subject matter

rejections coming out of the Patent Office? Second, which subject matter areas,

if any, are disproportionately subject to Alice rejections?

131. 35 U.S.C.S. § 8 (LexisNexis 2018); MANUAL OF PATENT EXAMINING PROCEDURE §§ 902–05 (8th ed.

Rev. 7, Sept. 2008).

132. MANUAL OF PATENT EXAMINING PROCEDURE, supra note 131, at § 905; Classification Standards and

Development, U.S. PAT. & TRADEMARK OFF., https://www.uspto.gov/patents-application-process/patent-

search/classification-standards-and-development (last visited Mar. 13, 2018).


Our analysis provides an aggregate view of the impact of Alice in the Patent

Office. We show that, as our intuition would suggest, Alice has resulted in an

increase in subject matter rejections, and that these rejections fall

disproportionately into a few specific subject matter areas. Stepping back, our

analysis shows that the impact of Alice on the stream of data produced by the

Patent Office is not random—instead, it includes a pattern or signal that can be

recognized by machine learning techniques, as shown below in Section V.

To begin our data analysis, we pulled a set of file histories from a uniform

random sample of about 20,000 patent applications filed during or after 2013.133

In the random sample of 20,000 applications, we found a total of 7,367 office

actions arising from 7,160 unique applications which had received at least one

office action. Of the 7,160 unique applications, we found 460 (6.4%) that

included at least one Alice rejection.

Plot 1, below, supports our selection of the relevant time period as being

after October 2014. Subject matter rejections were identified by searching for

particular text strings in office actions, as described above. Plot 1 shows the

monthly fraction of office actions containing a subject matter rejection. The data

in Plot 1 is based on our random sample of about 20,000 patent applications.

We then counted the number of office actions associated with the sample and

issued in a given month that contained a subject matter rejection. The error bars

provide the 95% confidence interval for the measured proportion.

Plot 1: Subject Matter Rejections by Month

133. As described above, we preferentially search for applications in classes that are rich in Alice rejections.

This process, while useful for finding examples of rejections for purposes of machine learning, skews the data

set in favor of particular subject matter classes where Alice rejections are common. This skew complicates any

reporting of generalized statistics or distribution of rejections. For this reason, the data analysis presented here

is based on a random sample of applications.


In Plot 1, a marked increase in subject matter rejections occurs in the

July/August, 2014 timeframe. This increase is consistent with the publication

of preliminary examination instructions for subject-matter eligibility by the

Patent Office on June 25, 2014.134

Plot 2, below, tells us which subject matter areas are subject to high

numbers of Alice rejections. The plot is based on the same random sample of

about 20,000 patent applications discussed above. The graph shows the total

number, by Cooperative Patent Classification (CPC) class,135 of applications that

have at least one Alice rejection. CPC classes having fewer than three rejections

were eliminated from consideration.

Plot 2: Alice rejections by CPC class

Table 1, below, provides descriptions for many of the CPC classes shown

in Plot 2. The classes with the highest number of rejections are G06Q (business

methods), G06F (digital data processing), and H04L (digital information

transmission).

Plot 3, below, provides another view of the random sample of 20,000

applications. The graph breaks down the total number of cases in each class into

those that have at least one Alice rejection and those without. The black and

light grey bars respectively represent the number of cases with and without an

Alice rejection.


135. See MANUAL OF PATENT EXAMINING PROCEDURE, supra note 131, at § 905 (describing the Cooperative

Patent Classification scheme by which the Patent Office groups applications by subject matter area).


Plot 3: Cases With/Without Alice Rejection by CPC class

As can be seen in Plot 3, the total sample size is quite small for many of

the classes. In order to improve the statistical significance of our findings, we

performed further data collection, focusing on those CPC classes from our

random sample that included at least some Alice rejections. This additional data

collection resulted in a larger, non-uniform sample of about 38,000 office

actions, of which about 3,500 included an Alice rejection.

Plot 4, below, tells us the percentage of applications in each class that have

been subjected to an Alice rejection. Plot 4 was generated based on our larger,

non-uniform data sample described above. The data in this sample was

purposely skewed towards those classes where Alice rejections are more

common, in order to determine a more statistically accurate rejection rate for

those classes.

Plot 4: Percentage of Alice rejections by CPC class


It is notable that in several subject matter areas, over 40% of the

applications are subjected to Alice rejections. Table 1, below, provides the titles

for the CPC classes shown in Plot 4, above.

Table 1: CPC class descriptions Class Rejection

Rate

n Standard

Error

Description

A61B 5.5% 1424 0.6 DIAGNOSIS; SURGERY; IDENTIFICATION

A61K 3.9% 2086 0.4 PREPARATIONS FOR MEDICAL, DENTAL, OR TOILET PURPOSES

A63F 43.3% 254 3.1 CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES …; VIDEO GAMES

B25J 10.2% 108 2.9 MANIPULATORS; CHAMBERS PROVIDED WITH

MANIPULATION DEVICES

B60W 12.6% 207 2.3 CONJOINT CONTROL OF VEHICLE SUB-UNITS OF

DIFFERENT TYPE OR DIFFERENT FUNCTION …

C07K 7.6% 514 1.2 PEPTIDES

C12N 7.9% 164 2.1 MICRO-ORGANISMS OR ENZYMES;

COMPOSITIONS THEREOF

C12Q 22.4% 183 3.1 MEASURING OR TESTING PROCESSES

INVOLVING ENZYMES OR MICRO-ORGANISMS …

F01N 12.3% 162 2.6 GAS-FLOW SILENCERS OR EXHAUST APPARATUS

G01C 20.6% 218 2.7 MEASURING DISTANCES, LEVELS OR BEARINGS;

SURVEYING; NAVIGATION; GYROSCOPIC

INSTRUMENTS; …

G01N 9.1% 942 0.9 INVESTIGATING OR ANALYSING MATERIALS …

G06F 10.2% 6501 0.4 ELECTRICAL DIGITAL DATA PROCESSING

G06K 5.6% 268 1.4 COMPUTER SYSTEMS BASED ON SPECIFIC

COMPUTATIONAL MODELS

G06Q 65.9% 1801 1.1 DATA PROCESSING SYSTEMS OR METHODS,

SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL,

SUPERVISORY OR FORECASTING PURPOSES; …

G06T 12.7% 675 1.3 IMAGE DATA PROCESSING OR GENERATION

G07F 51.6% 310 2.8 COIN FEED OR LIKE APPARATUS

G08G 21.8% 55 5.6 TRAFFIC CONTROL SYSTEMS

G09B 39.7% 131

4.3

EDUCATIONAL OR DEMONSTRATION

APPLIANCES; … MODELS; PLANETARIA; GLOBES;

MAPS; DIAGRAMS

G10L 28.4% 201

3.2

SPEECH ANALYSIS OR SYNTHESIS; SPEECH

RECOGNITION; SPEECH OR VOICE PROCESSING;

…

H04B 4.2% 240 1.3 TRANSMISSION

H04L 9.9% 3272 0.5 TRANSMISSION OF DIGITAL INFORMATION

H04M 9.7% 632 1.2 TELEPHONIC COMMUNICATION

H04N 4.2% 2277 0.4 PICTORIAL COMMUNICATION E.G. TELEVISION

H04W 3.9% 2401 0.4 WIRELESS COMMUNICATION NETWORKS

As shown in Table 1, the subject matter areas subject to the most (by

percentage) Alice rejections include “business methods”—data processing

systems for administration, commerce, finance, and the like (class G06Q);


games (classes A63F and G07F); educational devices, globes, maps, and

diagrams (class G09B); and speech analysis, synthesis, and recognition (G10L).

In this Section, we have presented an analysis of our dataset. Our analysis

indicates that that the subject matter areas that contain many applications with

Alice rejections include data processing, business methods, games, educational

methods, and speech processing. In the following Section, we use our dataset to

train a machine classifier to predict Alice rejections.

V. PREDICTING ALICE REJECTIONS WITH MACHINE CLASSIFICATION

Our core research goal is to predict, based on the text of a patent claim,

whether the claim will be rejected under Alice. One way to make this prediction

is to cast the exercise as a document classification problem. In document

classification, a document is assigned to a particular class or category based on

features of the document, such as words, phrases, length, or the like.136

Document classification can be automated by implementing on a computer

the logic used to make classification decisions. One very successful example of

automated document classification is found in the “spam” filter provided by

most modern email services.137 A typical spam filter classifies each received

email into spam (i.e., junk mail) or non-spam (i.e., legitimate email) based on

features of the email, such as its words, phrases, header field values, and the

like.138

While it is possible to manually implement the decision logic used to

classify documents, it is more common to use machine learning.139 At a high

level of generality, machine learning is a technique for training a model that

associates input features with outputs.140 Machine learning can be supervised or

unsupervised.141 In supervised learning, a “teacher” trains a model by presenting

it with examples of input and output pairs.142 The model is automatically

adjusted to express the relationship between the observed input-output pairs.143

The model can then be validated by testing it against novel inputs and tallying

how often the model makes the correct classification.144 In unsupervised

learning, the goal is to identify patterns in a dataset without guidance provided

by a teacher.145 In unsupervised learning, a model is generated without the use

of input-output pairs as training examples.146

Our methodology employs supervised learning. Our goal is to teach a

machine to classify a patent claim as eligible or ineligible based on its words.

136. RUSSELL & NORVIG, supra note 5, at 865.

137. Mehran Sahami et al., A Bayesian Approach to Filtering Junk E-Mail, in AAAI TECHNICAL REPORT

1998, at 55–56 (No. WS–98–05).

138. Id.

139. RUSSELL & NORVIG, supra note 5, at 693–95.

140. Id.

141. Id. at 695.

142. Id.

143. See id. at 697–703 (discussing an approach to generating a decision tree based on observed examples).

144. Id. at 708–09.

145. Id. at 694.

146. Id.


The input-output pairs used for training are obtained from our dataset, described

above, which associates patent claims with corresponding classifications—

eligible or not eligible—made by human examiners at the Patent Office. As

described in detail below, these classifications can be used to train and evaluate

various common supervised machine learning models.

In the following Subsections, we begin with an exploratory analysis that

identifies particular words that are associated with eligibility or ineligibility

under Alice.147 The presence of such associations indicates that there exist

patterns that can be learned by way of machine learning.148 Next, we describe

the training, testing, and performance of a baseline classifier, in addition to

techniques for an improved classifier.

A. Word Clouds

When initially exploring whether it would even be possible to predict

subject matter rejections based on the words of a patent claim, we first explored

the associations between claim terms and Alice rejections. Plot 5, below,

includes two word clouds that can be used to visualize such associations. In Plot

5, the left word cloud depicts words that were highly associated with acceptable

subject matter, where the words are sized based on frequency. The right word

cloud depicts words highly associated with unacceptable subject matter. Each

word cloud was formed by building a frequency table, which mapped each word

to a corresponding frequency. A first frequency table was built for eligible

claims, and a second frequency table was built for ineligible claims. The highest

N words from each table were then displayed as a word cloud, shown below.

Plot 5: Raw frequency word clouds

Eligible: Ineligible:

147. See generally Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 134 S. Ct. 2347 (2014) (stating the standard

for subject matter eligibility).

148. See RUSSELL & NORVIG, supra note 5, at 2 (stating that machine learning extracts and extrapolates

patterns).


Note that many terms, such as “method,” “data,” and “device” appear with

high frequency in both accepted (left cloud) and rejected claims (right cloud).

This is not surprising as these are very common words in the patent claim

context. However, for our purposes, these terms are not useful for distinguishing

which words are more commonly associated with exclusively one class or the

other.

Plot 6, below, includes two frequency word clouds without common terms.

Put another way, the words shown in the clouds of Plot 6 are sized based on the

absolute value of the difference of the frequencies in Plot 5. Thus, a word that

is equally common in both data sets (accepted and rejected claims) should not

appear in either cloud.

Plot 6: Raw frequency without common terms

Eligible: Ineligible:

The word clouds of Plot 5 do a better job of matching our intuition about

what kinds of words might be associated with Alice rejections.149 In the right

(ineligible) word cloud, we see terms such as “method,” “computer,”

“information,” “associated,” “transaction,” “payment,” “account,” and

“customer.” These are all words that would be used to describe business and

financial methods, techniques that are in the crosshairs of Alice.150 On the left

side, in the eligible word cloud, there are more terms that are associated with

physical structures, including “portion,” “formed,” “surface,” “connected,”

“disposed,” “configured,” “material,” and the like.

Table 2, below, lists the top twenty claim terms that are respectively

associated with patent eligibility or ineligibility.151

149. See Alice, 134 S. Ct. at 2349 (relating to terms closely related to a financial transaction).

150. See id. (regarding a financial transactions scheme).

151. See generally id. at 2355–57 (explaining the types of claims that are not patent eligible).


Table 2: Words Strongly Associated With (In)eligibility

Eligible Claims Ineligible Claims

surface control method implemented

portion layer computer providing

connected signal associated transaction

end substrate determining generating

disposed body receiving identifying

formed material information storing

configured light user account

direction arranged system database

extending member data payment

side form processor game

B. Classifier Training

As noted above, predicting whether a patent claim will be subject to an

Alice rejection is a classification problem, similar to that of detecting whether

an email message is spam.152 At the end of the above-described data collection

process, we are in possession of a data set that includes about 20,000 claims,

each of which is labeled as “accept” (subject matter eligible—no Alice rejection

issued) or “reject” (subject matter ineligible—Alice rejection issued).153

Roughly 85% of the claims in the data set are patent eligible, while the remaining

15% are ineligible.154

The dataset is then used to train a classifier, which is a mathematical model

that maps input features of a document to one or more classes.155 For example,

as discussed above in the context of a spam filter for email, a classifier

determines the class of an email (i.e., spam or not spam) based on its features

(e.g., words).156 Similarly, in our application, we generate a classifier that

determines the class of a patent claim (i.e., valid or invalid) based on its features

(e.g., words).

In machine learning, a classifier can be trained in a supervised manner by

showing the classifier many examples of each class.157 The classifier learns that

certain features or combinations of features are associated with one class or the

other.158 In our case, we elected to show the classifier the words of a claim,

without reference to their order, meaning, or location in the claim. This is

sometimes known as a “bag of words” approach, in which a text passage is


153. As discussed in detail above, the “accept” claims in the data set were those obtained from applications

that had received an office action during the relevant time period (post Oct. 2014), so that we could be reasonably

certain that an examiner had evaluated the claim for Alice compliance. For machine learning purposes, we

limited the claims in the ACCEPT class to those from patents that issued during the relevant time period, because

we could be confident that those claims were in the form actually examined and approved by the examiner. This

reduced the number of total claims from about 29,000 to 22,000.

154. Our full training set included 21,693 claims, of which 2,963 (about 13.7%) were rejected as abstract.

155. See generally RUSSELL & NORVIG, supra note 5, at 696 (explaining the use of classifiers).

156. Id. at 710.

157. See id. at 697–703 (discussing an approach to generating a decision tree based on observed examples).

158. Id.


converted into a frequency table of terms.159 The words of the claim provide the

input features for the classifier; the output of the classifier is an indication of

whether the claim is patent eligible or not.160

Prior to training, we stemmed the words of the claims.161 Stemming

converts distinct words having the same roots but different endings and suffixes

into the same term.162 Stemming reduces the number of distinct words being

analyzed in a machine learning scenario.163 For example, words such as

“associate,” “associated,” “associating,” and “associates” may all be converted

to the term “associ.”164

When training a classifier, the dataset is typically split into two subsets, a

training set and a testing set.165 The classifier is not exposed to examples in the

testing set until after training is complete, in order to obtain a true gauge of the

classifier’s performance.166 In our case, we set aside 20% of the cases for the

test set and used the remaining cases for training.

We then trained our classifiers using the remaining ineligible claims (about

2,400) and the same number of randomly selected eligible claims. By adjusting

the mix of eligible and ineligible claims, a classifier can be biased towards

classifying examples towards the majority class.167 In our case, the use of an

even split was used to make the classifier more likely to recognize claims as

ineligible, at the cost of introducing additional false positives—claims classified

as ineligible that are in fact eligible. As discussed further below, the mix of

eligible and ineligible training examples can be adjusted to tune classifiers to

emphasize particular performance metrics.

C. Performance of a Baseline Classifier

There exist many different machine classification techniques.168 Modern

machine-learning toolkits provide implementations of multiple classifiers via

uniform programming interfaces.169 In this study, we started by training a

159. Id. at 866.

160. E.g., Dugan, supra note 11.

161. Martin Porter, An Algorithm for Suffix Stripping, 14 PROGRAM 130–38 (1980).

162. Alvise Susmel, Machine Learning: Working with Stop Words, Stemming, and Spam, CODE SCHOOL

BLOG (Mar. 25, 2016), https://www.codeschool.com/blog/2016/03/25/machine-learning-working-with-stop-

words-stemming-and-spam/; see also RUSSELL & NORVIG, supra note 5, at 870 (showing examples of

stemming).


164. By converting variations of a word onto a single term, stemming has the effect of condensing a sparse

feature set. With a sufficiently large number of samples, stemming may not be necessary, and may even degrade

classifier performance. CHRISTOPHER MANNING ET AL., AN INTRODUCTION TO INFORMATION RETRIEVAL 339

(Cambridge University Press Online ed. 2009), https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf.


166. Id. at 695–96.

167. See Yanmin Sun et al., Classification of Imbalanced Data: A Review, 23 INT’L J. PATTERN

RECOGNITION & ARTIFICIAL INTELLIGENCE 687–719 (2009) (explaining how a classifier can be biased towards

the majority class).

168. See, e.g., RUSSELL & NORVIG, supra note 5, at 717–53 (discussing various approaches to supervised

learning, including decision trees, logistic regression, and neural networks).

169. See Fabian Pedregosa et al., Scikit-learn: Machine Learning in Python, 12 J. MACHINE LEARNING

RES. 2825–30 (2011), (explaining how scikit-learn works); see also Scikit-learn Online Documentation,

http://scikit-learn.org (last visited Mar. 13, 2018) (demonstrating an example of a machine learning tool kit).


Logistic Regression classifier on an equal mix of eligible and ineligible example

claims.170

Table 3: Logistic Regression Classifier Performance Results

Classifier Class Precision Recall F-

score

Accuracy MCC

Logistic

Regression171

accept 0.960 0.747 0.840

reject 0.332 0.803 0.470

average 0.875 0.755 0.790 0.755 0.401

Table 3 above provides performance data for a baseline Logistic

Regression classifier.172 Performance was measured by first training the

classifier using claims in the training data set, then exposing the trained classifier

to claims in the test data set, and finally tabulating the resulting classifications.

The resulting classifications can be compared to the actual classifications (made

by human examiners at the Patent Office) in order to determine how well the

classifier performed.

Many metrics are available to evaluate the performance of a classifier.173

Precision is the fraction of the instances classified into a given class that are

correctly classified.174 Our baseline classifier has a precision in the “accept”

class of about 0.96. This means that for every 100 claims classified as patent

eligible, about ninety-six of them are correctly classified.175 Recall is the

fraction of relevant instances that are correctly classified.176 Our baseline

classifier has a recall in the “accept” class of about 0.75. This means that if there

are 100 claims that are patent eligible, the classifier will find (correctly classify)

about seventy-five of them.177

A number of aggregate performance metrics are also available. The F-score is the harmonic mean of precision and recall.178 Accuracy reflects the

fraction of test cases correctly classified.179 Note that accuracy is not necessarily

a useful metric in the presence of imbalanced data.180 For example, if 90% of

170. RUSSELL & NORVIG, supra note 5, at 725–27; Logistic Regression, SCIKIT LEARN, http://scikit-

learn.org/stable/modules/linear_model.html#logistic-regression (last visited Mar. 13, 2018).

171. Logistic Regression, SCIKIT LEARN, http://scikit-learn.org/stable/modules/linear_model.html#

logistic-regression (last visited Mar. 13, 2018).

172. See RUSSELL & NORVIG, supra note 5, at 717–27 (explaining how to calculate a logistic regression).

173. See id. at 723–48 (explaining several ways to measure a classifier).

174. Id. at 869.

175. See Table 3; RUSSELL & NORVIG, supra note 5, at 869 (providing an example of how to calculate

precision).


177. See Table 3; RUSSELL & NORVIG, supra note 5, at 869 (providing an example of how to calculate

recall).


179. Note that accuracy is the same as weighted average recall.

180. Yanmin Sun et al., supra note 167, at 696; see Josephine Sarpong Akosa, Predictive Accuracy: A

Misleading Performance Measure for Highly Imbalanced Data, LINKEDIN (January 24, 2017),


the claims are eligible, a “null” classifier that classifies every claim as eligible

will have an accuracy of 90%. Such a classifier would of course not be useful

in the real world, but does provide a useful baseline for evaluating a machine

learning model.

Matthews Correlation Coefficient (denoted MCC in the table) measures the

quality of a binary classification.181 It is effective even if in the presence of

imbalanced data, such as is present in this study.182 MCC is a value between -1

and +1, where +1 occurs when every prediction is correct, 0 occurs when the

prediction appears random, and -1 occurs when every prediction is incorrect.183

Our baseline Logistic Regression classifier has an MCC score of 0.40.

As noted above, by changing the mix of training examples, we can adjust

a classifier’s view of the ground truth of the world. For example, if, during

training, a classifier sees a mix of 85% eligible cases and 15% ineligible cases,

it will tend to be much more likely to classify a given example as eligible. If

instead the classifier is trained on an equal mix of eligible and ineligible cases,

we would expect it to be less biased towards eligible classifications.184 This

relationship is illustrated in Plot 7, below.

Plot 7: Impact of Training Mix on Classifier Performance Metrics

https://www.linkedin.com/pulse/predictive-accuracy-misleading-performance-measure-highly-akosa/

(discussing the lack of accuracy in an imbalanced data set).

181. Matthews Correlation Coefficient, SCIKIT LEARN, http://scikit-learn.org/stable/modules/generated/

sklearn.metrics.matthews_corrcoef.html (last visited Mar. 13, 2018).

182. Id.

183. Id.

184. Id.; see Table 3.


The data for Plot 7 was obtained by training a Logistic Regression classifier

using different mixes of eligible and ineligible example claims. Note that at a

1:1 ratio of eligible to ineligible training examples, the recall rate for the accept

(eligible) and reject (ineligible) classes is roughly equal at 0.75. However, at

that ratio, the reject precision is quite low, around 0.3, meaning that the classifier

produces many false positives. The MCC metric, which considers true and false

positives and negatives, begins to level off at around 0.45 at a ratio of about 2:1.

D. Performance of an Improved Classifier

We next attempted to develop a classifier that improved upon the MCC

score for our baseline Logistic Regression classifier. We implemented our

improved classifier as an ensemble of multiple different classifiers.185 Each

classifier was trained using an adjusted example mix, using Plot 7 as a guide.

By inspecting Plot 7, and running a number of trials, we learned that training our

classifiers on a ratio of eligible to ineligible examples of 5:2 tended to maximize

the MCC score.

The selection of the particular types of classifiers was to a large extent

exploratory and arbitrary. The machine learning toolkit utilized in this study

provides many different classifiers that each have a uniform interface for

training and prediction.186 Thus, given our initial data set, it is almost trivial to

experiment with different classification approaches known in the art. Training

and testing multiple distinct classification schemes allowed us to understand

whether some types of classifiers outperformed our baseline Logistic Regression

classifier, above.

Ensemble classification aggregates the outputs of multiple classifiers,

thereby attempting to overcome misclassifications made by any particular

classifier.187 In our ensemble, we employed a voting scheme, in which a final

classification was based on the majority outputs for a given test case provided

to each of our multiple classifiers. Table 4, below, provides the performance

results of our improved classifier.

185. Supervised Learning, SCIKIT LEARN, http://scikit-learn.org/stable/supervised_learning.html (last

visited Mar. 13, 2018) (providing examples of different classifiers, in addition to Logistic Regression, such as a

Naïve Bayesian classifier, a Decision Tree classifier, a Random Forest classifier, a Support Vector Machine

classifier, a Stochastic Gradient Descent classifier, an AdaBoost classifier, a K-Neighbors classifier, and a

Gradient Boosting classifier).

186. Id.



Table 4: Tuned Machine Classification Performance Results

Classifier Precision Recall F-score Accuracy MCC

Logistic Regression 0.876 0.867 0.871 0.867 0.448

Naïve Bayes 0.872 0.533 0.600 0.533 0.259

Decision Tree 0.848 0.844 0.846 0.844 0.325

Random Forest 0.872 0.873 0.872 0.873 0.432

Support Vector Machine 0.871 0.884 0.875 0.884 0.419

Gradient Descent 0.876 0.876 0.876 0.876 0.449

AdaBoost 0.871 0.858 0.864 0.858 0.426

K-Neighbors 0.856 0.829 0.840 0.829 0.355

Gradient Boosting 0.863 0.865 0.864 0.865 0.392

Ensemble

(multiple

classifiers)

accept 0.933 0.933 0.933

reject 0.551 0.552 0.552

average 0.884 0.884 0.884 0.884 0.485

The bottom section of Table 4 shows the overall performance of the

ensemble classifier, including metrics for the individual classes and its average

performance. The upper nine rows of Table 4 show the average performance of

each of the individual classifiers that make up our ensemble.

We need to be careful to not make too much of the accuracy numbers

above. At first blush, accuracy scores approaching 90% seem quite impressive,

but we must remember that a “dummy” classifier that always classifies every

example as patent eligible would have an accuracy of about 85%, given the mix

of our population.188 However, such a dummy classifier would have an MCC

score of 0, because it would never correctly classify an ineligible claim.189

Therefore, we put more weight in the MCC score, which reached about 0.485 in

the above test run. Note that using a voting ensemble did yield performance

gains, as no individual classifier attained an MCC score over 0.449.

In the end, the best classifier is the one that does the best job of meeting

the particular requirements of its application. Plot 7 shows us how to train

classifiers to meet particular requirements. For example, if the classifier is to be

used as an “early warning” system to flag patent claims that may have eligibility

problems under Alice,190 then we would like a classifier that has a high recall

score for ineligible claims. As can be seen in Plot 7, this will come at a loss of

precision, meaning that the classifier will identify many false positives—claims

classified as ineligible that are not actually ineligible. Of course, this loss of

precision may be acceptable if it is really important to catch as many ineligible

claims as possible.

188. DummyClassifier, SCIKIT LEARN, http://scikit-learn.org/stable/modules/generated/sklearn.dummy.

DummyClassifier.html (last visited Mar. 13, 2018); see Table 3.

189. Matthews Correlation Coefficient, SCIKIT LEARN, http://scikit-learn.org/stable/modules/generated/

sklearn.metrics.matthews_corrcoef.html (last visited Mar. 13, 2018).

190. See generally Alice Corp. v. CLS Bank Int’l, 134 S. Ct. 2347 (2014) (stating the standard for subject

matter eligibility).


E. Extensions, Improvements, and Future Work

Our machine learning process described above may be improved in many

ways. First, other features could be considered. Currently, the only features

being considered are term frequencies. Other features that are not currently

being considered include claim length in words, the number of syntactic claim

elements (e.g., the number of clauses separated by semi-colons),191 the number

of gerunds (e.g., receiving, transmitting),192 or the like.

Also, the current approach measures only the occurrences of single terms,

using a “bag of words” approach.193 In this approach, each claim is reduced to

a frequency table that associates each claim term with the number of times it

occurs in the claim. Therefore, no information is retained about the location or

co-occurrence of particular words. This issue can be addressed at least in part

by the use of n-grams. An n-gram is a sequence of terms of length n that appear

in the given text.194 For example, bigrams (2-grams) represent sequential pairs

of words in the claim.

It is possible that an n-gram-based representation will provide additional

features (e.g., specific 2- or 3-word sequences) that are highly correlated with

eligible or ineligible claims.195 For example, “computer interface” might be

differently correlated with eligible or ineligible claims than the word “computer”

or “interface” taken individually.196 This may be because, for example, the word

“interface” has many definitions, including: (1) the interaction between two

entities or systems; (2) a device or software for connecting computer

components; (3) fabric used to make a garment more rigid.197 The first of these

definitions is probably highly associated with ineligible claims, while the last of

these definitions is probably highly associated with eligible claims, with the

second definition likely appearing somewhere in the middle. By using 2-grams

in this case, it may be possible to distinguish which of these three cases applies.

Other potential improvements include the use of a larger dictionary.198 The

current approach has typically utilized just the most significant terms in the data

set, generally set around 1,000 terms.199 Keeping the term dictionary small

facilitates rapid classifier training time, and thus the ability to experiment with

many different classifier parameter settings. In a production setting, however,

191. Clauses and Clause Elements, http://folk.uio.no/hhasselg/grammar/Week2_syntactic.htm (last visited

Mar. 13, 2018).

192. Gerund, DICTIONARY.COM, www.dictionary.com/browse/gerund (last visited Mar. 13, 2018).


194. Id. at 861.

195. Id.

196. Id.

197. Technically, this fabric is called “interfacing,” although if stemming is employed, then “interface” and

“interfacing” will likely be stemmed to the same term.

198. See RUSSELL & NORVIG, supra note 5, at 921 (providing examples of dictionaries with numerous word

entries).

199. JURE LESKOVEC, ANAND RAJARAMAN & JEFFREY D. ULLMAN, MINING OF MASSIVE DATASETS 8

(2014), http://infolab.stanford.edu/~ullman/mmds/book.pdf. Significance is determined using the TF-IDF

(Term Frequency times Inverse Document Frequency) measure, which reduces the weight of words that are very

common in a given corpus.


using a larger dictionary may yield marginal but useful performance

improvements.200

In this Section, we have described the supervised training and performance

of machine classifiers that are capable of predicting whether a given patent claim

is valid under Alice. Using the Matthews Correlation Coefficient as our

preferred metric, we have developed classifiers that obtain scores in excess of

0.40 when evaluated against test data held out from our overall dataset. In the

following Sections, we describe two applications for our machine classifier.

First, in Section VI, we present a Web-based patent claim evaluation system that

uses our classifier to predict Alice compliance for a given patent claim. Next, in

Section VII, we use classifier to quantitatively estimate the impact of Alice on

the universe of issued patents.

VI. A PATENT CLAIM EVALUATION SYSTEM

In the present Section, we describe a Web-based patent claim evaluation

system that employs our automatic predictive classification techniques

described above. After presenting our system, we describe a number of uses

cases for the system within the context of the patent lifecycle, including

application preparation, prosecution, enforcement, valuation, and transactions.

We conclude with a brief outline of some of the issues presented by the use of

“machine intelligence” in the context of legal work.

Our patent claim evaluation system is an example of a computer-assisted legal service, the application of computer function and intelligence to the

efficient rendering of legal services. While the legal field has been slow to adopt

the efficiencies obtained from information technologies, it has not been immune

to change.201 The word processing department in a law firm is gradually giving

way to self-service word processing and speech recognition dictation

software.202 While many attorneys still like to “look it up in the book,” legal

research is increasingly being performed via computer.203 Manual document

review during litigation is being replaced by electronic discovery, including

machine classification to discover responsive documents.204 The claim

evaluation system described below supports the analytic functions traditionally

performed by an attorney, by helping the attorney identify problematic claims,

200. Preliminary results do not show significant improvement with a larger dictionary. This may be due

to our sample size in relation to the size of the dictionary—there are likely not enough samples to determine the

true relationship between a rare feature (term) and an eligible or ineligible classification.

201. Blair Janis, How Technology is Changing the Practice of Law, A.B.A., https://www.americanbar.org/

publications/gp_solo/2014/may_june/how_technology_changing_practice_law.html (last visited Mar. 13,

2018).

202. Nerino J. Petro, Jr., Speech Recognition: Is it Finally Ready for the Law Office, A.B.A.,

https://www.americanbar.org/publications/law_practice_home/law_practice_archive/lpm_magazine_articles_v

33_is2_an3.html (last visited Mar. 13, 2018).

203. Sally Kane, Legal Technology and the Modern Law Firm, BALANCE (May 15, 2017),

https://www.thebalance.com/technology-and-the-law-2164328.

204. Id.


much in the same way as a doctor would employ a medical test to identify

disease.205

A. System Description

We have implemented a proof-of-concept Web-based classification

application (“the claim evaluator”) that can be used to evaluate patent claim text

for subject matter eligibility.206 The claim evaluator receives a patent claim

input into an HTML form presented in a Web browser.207 The claim text is

stemmed and presented to one or more machine classifiers trained as described

above.208 The classifiers provide a result that is presented on the displayed Web

page.209

Below is a screenshot showing an example output of the claim evaluator,

given as input an example claim directed to a programmable computer memory

system. This claim is obtained from U.S. Patent No. 5,953,740210 and was not

used to train our classifier.211 The evaluator classifies this as eligible, with five

out of nine (55%) classifiers of the ensemble in agreement.

This claim was also analyzed on appeal by the CAFC in Visual Memory v. NVIDIA Corp.212 There, a three-judge panel held the claim to be patent eligible

under Alice by a two-to-one vote.213 Interestingly, the five-to-four vote of our

ensemble of classifiers indicates that this was also a somewhat close case for our

classification system.214

205. Dugan, supra note 11; see Gene Quinn, Understanding Patent Claims, IPWATCHDOG, (July 12, 2014)

http://www.ipwatchdog.com/2014/07/12/understanding-patent-claims/id=50349/ (providing a brief summary of

patent claims and the challenges in drafting them).

206. The use of a proof-of-concept helps determine how a software system would actually work, and

whether it is viable as a tool or solution. Odysseas Pentakolos, Proof-of-Concept Designs, MICROSOFT

DEVELOPER NETWORK, (Jan. 2008) https://msdn.microsoft.com/en-us/library/cc168618.aspx; Dugan, supra

note 11.

207. Dugan, supra note 11.

208. At the time of writing, the ensemble consists of the nine classifiers listed in Table 4 and footnote 185

above. The output of the evaluator is based on the “votes” of each of the classifiers.

209. Dugan, supra note 11.

210. U.S. Patent No. 5,953,740 (filed Oct. 5, 1993).

211. Dugan, supra note 11; see RUSSELL & NORVIG, supra note 5, at 444–45 (explaining that using test

data that was not present in the training set tests the actual accuracy of the training set data, and will help

determine whether more training is necessary).

212. Visual Memory LLC v. NVIDIA Corp., 867 F.3d 1253, 1257 (Fed. Cir. Aug. 2017).

213. See id. at 1260–62.

214. Id.


Screenshot 1

At the bottom of the screenshot, shown above, stemmed claim terms are

highlighted red or green to respectively indicate a positive or negative

association with patent eligibility under Alice.215 This feature can help the user

redraft an example claim to recharacterize the invention at a lower level of

abstraction. In the example above, the stemmed terms “comput,” “data,”

“processor,” and “store” are colored red to indicate a correlation with patent

ineligibility. Terms such as “oper,” “main,” “memori,” “configur,” “cache,”

“connect,” and “determine” are colored green to indicate a correlation with

patent eligibility. The remaining terms are colored black to indicate a lack of

strong correlation in either direction.

Below is a second screenshot of the claim evaluator. This time the

evaluator has been asked to evaluate a claim directed to a method for providing

a performance guarantee in a transaction. This claim is obtained from U.S.

Patent No. 7,644,019.216

215. See Logistic Regression, SCIKIT LEARN, http://scikit-learn.org/stable/modules/linear_model.html#

logistic-regression (last visited Mar. 13, 2018) (highlighting a term is based on a coefficient for that term

determined by the logistic regression classifier of the ensemble).

216. U.S. Patent No. 7,644,019 (filed Apr. 21, 2003).


Screenshot 2

In Screenshot 2, the evaluator classifies this claim as ineligible, with all

nine classifiers in the ensemble in agreement. This result is consistent with our

intuition that business-method type claims are more likely than memory system

claims (as in Visual Memory, above) to be invalid under Alice.217 This result is

also consistent with the decision of the CAFC, which held this claim invalid

under Alice in buySAFE v. Google.218

B. Claim Evaluation System Use Cases

The described claim evaluator is first and foremost useful for helping

understand whether it is even worth applying for a patent directed to a particular

invention. There are many hurdles that must be crossed before obtaining a

patent.219 It is frustrating for clients that a patent attorney cannot give reasonable

assurances that he or she will be able to draft a patent application having claims

that can overcome all of these hurdles.220 For example, while a prior art search

can uncover at least some of the relevant prior art, it is very difficult to predict

how an examiner might combine the teachings of multiple prior art references

to generate a rejection for obviousness.221

217. See generally Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 134 S. Ct. 2347 (2014); Visual Memory LLC

v. NVIDIA Corp., 867 F.3d 1253 (Fed. Cir. Aug. 2017). (finding that a business-method claim was invalid in

Alice, but a memory system claim in Visual Memory was valid).

218. buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355 (Fed. Cir. 2014).

219. Patent Process Overview, U.S. PATENT & TRADEMARK OFFICE, https://www.uspto.gov/patents-

getting-started/patent-process-overview#step1 (last visited Mar. 13, 2018).

220. Gene Quinn, Why Patent Attorneys Don’t Work on Contingency, IPWATCHDOG, (Jul. 8, 2017)

http://www.ipwatchdog.com/2017/07/08/why-patent-attorneys-dont-work-on-contingency-2/id=85514/.

221. Gene Quinn, Understanding Obviousness: John Deere and the Basics, IPWATCHDOG, (Oct. 10, 2015)

http://www.ipwatchdog.com/2015/10/10/understanding-obviousness-john-deere-and-the-basics-2/id=62393/.


With the described evaluator, however, it is now at least possible to flag

inventions that that may be subjected to a higher level of scrutiny under the Alice

subject matter test.222 For example, the client or the attorney can draft an abstract

or overview of the invention, which is then used as input to the evaluator. If the

evaluator tags the given text as ineligible, the client has reason to be concerned.

The evaluator is also useful when preparing the claims of a patent

application, and more generally for determining the preferred terms used to

describe the invention in the application text. For example, the attorney may

prepare a draft independent claim and provide it to the evaluator. Depending on

the output of the evaluator, the attorney may revise the claim to use different

terms, to describe the invention at a lower level of generality or abstraction, or

to claim the invention from a different aspect. Also, one can imagine extending

the function of the evaluator to suggest synonyms or related terms that are less

highly correlated with Alice rejections.

The described evaluator can also be used to analyze issued patent claims.

For example, a patentee could use the evaluator to determine whether an issued

patent claim is likely to survive a challenge under Alice if asserted during

litigation. As another example, a defendant or competitor could use the

evaluator to assess the likelihood of success of challenging the validity of a claim

under Alice in litigation or via a post-grant review.

It is also possible to imagine using the evaluator as part of an automated

(or computer-assisted) patent analysis or valuation system. Patent analysis,

whether performed as part of rendering an opinion of invalidity or determining

a patent valuation, is expensive business.223 Legal fees paid for patent analysis

are a source of high transaction costs faced by parties attempting to determine

whether or how to manage the risk presented by a patent, such as via acquisition,

license, or litigation.224 The described evaluator can be used to reduce these

costs because it can perform an initial analysis in an automated manner. This

analysis can be used as a data point in a patent valuation formula, as part of a

due diligence checklist, or the like.

C. Questions Arising From the Application of Machine Intelligence to the Law

The claim evaluation system described above demonstrates that machine

learning can be employed to at least support the analytic functions traditionally

performed by an attorney.225 The application of machine learning in this context

222. Alice, 134 S. Ct. at 2354–55.

223. See 2015 Report, supra note 59 at I-95, (stating the median legal fee to prepare an invalidity opinion

is $10,000. While non-compliance with 35 U.S.C. § 101 is only one of several bases for invalidity, our claim

evaluator could still reduce the cost of a typical invalidity opinion, by flagging for further review and analysis

claims that are likely invalid under Alice).

224. See generally, Rebecca S. Eisenberg, Patent Costs and Unlicensed Use of Patented Inventions, 78 U.

CHI. L. REV. 53 (2011).

225. See Roitblat et al., Document Categorization in Legal Electronic Discovery: Computer Classification

vs. Manual Review, 61 J. AM. SOC. INFO. SCI. & TECH. 70, 79 (2009), http://apps.americanbar.org/litigation/

committees/corporate/docs/2010-cle-materials/09-holding-line-reasonableness/09c-document-

categorization.pdf (exemplifying we are not the first to make this claim; for example, automatic document


of course raises a host of questions. While answers to many of these questions

are beyond the scope of this Article, we will briefly address some of them below.

The first area of concern relates to the substitutability of machine

intelligence for human analysis. To address this issue, we first need to

understand whether the performance of the machine intelligence is even

comparable to a trained human analyst. Above, we have compared the

performance of our “proof of concept” classifier to the aggregated performance

of the human examination corps of the Patent Office. While we find that our

classifier is certainly better than guessing, it is still wrong over 10% of the time.

Is this an acceptable margin of error? The answer to that question depends on

the context. If the classifier is used to give a client an initial “heads up”

regarding a patentability issue (possibly without the cost of consulting an

attorney), then the answer is probably yes. If the classifier is used to determine

whether to file a multi-million dollar lawsuit, then the answer may be no.

The possibility of failure raises the specter of legal malpractice. It is

possible to construct a hypothetical in which an attorney is found liable for

malpractice for relying on an incorrect assessment provided by a machine

classifier or other “intelligent” agent. Yet, we think the benefits to clients

outweigh the risks, so long as the client is made to understand the uncertainties

that accompany any prediction.

In some ways, using our machine classifier is not so different from

performing a prior art search prior to preparing a patent application. Many

clients request searches in order to identify any “knock out” prior art.226 As an

initial matter, no patent attorney will assert that the results of a particular search

will guarantee smooth sailing before the Patent Office.227 There are simply too

many unknowns, including the limitations of search engines and the fact that

patent applications remain non-public and thus unsearchable for at least eighteen

months after filing.228 If knock out art is found, the client may elect to redirect

her legal resources to a different project.229 And if instead the client elects to

move forward, the patent attorney can use the search results to obtain a better

understanding of the prior art, which should result in claims that require fewer

interactions with the Patent Office, thereby also saving the client legal fees.230

Machine classification to identify Alice-related issues early in the process can

be used to similarly to help a client decide where to best apply their limited

resources, and, further, to help an attorney craft better claims in borderline cases.

The second area of concern and inquiry is more philosophical in nature.

What does it mean when our classifier determines that a particular claim is not

classification employed in the context of electronic discovery has been shown to be at least as accurate as human

review).

226. Gene Quinn, Patent Search 101: Why US Patent Searches are Critically Important, IPWATCHDOG

(Jan. 13, 2018), http://www.ipwatchdog.com/2018/01/13/patent-search-101-patent-searches/id=92305/.

227. Id.

228. See 35 U.S.C. § 112 (2012) (requiring an application publishes eighteen months after its earliest filing

date for which benefit is sought); 37 C.F.R. § 1.211 (2017) (requiring that when the applicant files a non-

publication request, the application will not publish until and unless it issues into a patent).

229. Quinn, supra note 226.

230. Id.


patent eligible? How can we rely on a system that first reduces an English

language sentence, with all of its syntactic structure and semantic import, to a

bag of stemmed tokens, and then feeds those tokens to a mathematical model

that computes a thumbs up or down, without any understanding of language and

without any explicit instruction in the rules of the Alice test? Questions such as

these have been debated since the earliest attempts to build intelligent

machines.231

We take no position on whether our classifier understands language or can

otherwise be considered intelligent. At a minimum, our classifier reasonably

approximates the behavior of an aggregated group of human patent examiners

that are doing their best to implement the Alice test in the examination context.

And while there may be nothing like understanding or rule processing in our

classifier, this does not mean that it does not have useful, practical applications.

In this Section, we have presented a claim evaluation system that employs

our machine classifier. We have also discussed its potential applications in the

context of rendering legal services in a post-Alice world. We have concluded

by briefly outlining some of the concerns and issues related to the use of machine

intelligence in the context of legal work. We conclude that our application of

machine intelligence can support the traditional analytic functions provided by

attorneys, while at the same time allowing clients to make better informed

decisions about the application of their limited economic resources.

VII. ESTIMATING THE IMPACT OF ALICE ON ISSUED PATENTS

In this Section, we use our approach to estimate the impact of the Alice

decision on the millions of in-force patents. For this Article, we estimated the

number of patents invalidated under Alice by classifying claims from a sample

of patents issued prior to the Alice decision. To perform the evaluation, we

employed the following approach. First, we trained a machine classifier as

discussed above. Second, we determined whether our classifier can serve as an

acceptable proxy for the decision making of the Federal Courts. Third, we

evaluated the first independent claim from 1% of the issued patents in our patent

corpus: about 40,000 patents issued between 1996 and 2016.

A. The Classifier

For this analysis, we used a Logistic Regression classifier. We used a

Logistic Regression classifier because, compared to the ensemble classifier

discussed above, it is quick to train and can efficiently process the large number

of patent claims in our sample. In addition, the performance of the Logistic

231. See, e.g., John R. Searle, Minds, Brains, and Programs, in MIND DESIGN 282–306 (John Haugeland

ed., 1981) (describing intentionality in the human brain in relation to computer programs’ lack of intentionality);

HUBERT DREYFUS, WHAT COMPUTERS STILL CAN’T DO: A CRITIQUE OF ARTIFICIAL REASON (1992) (explaining

the history and origin of artificial intelligence).


Regression classifier is not much worse than our ensemble classifier.232 The

classifier was trained on a 1:1 ratio of eligible to ineligible examples. As shown

in Plot 7, this ratio results in a classifier with roughly equal precision and recall

scores for the ineligible class. Note that such a classifier is somewhat aggressive

with respect to classifying claims as ineligible. Such a classifier is “good” at

finding ineligible claims, at the expense of additional false positives (claims

classified as ineligible that are actually eligible).233

B. Classifier Validation

We next determined whether our classifier could serve as an acceptable

proxy for the decision making of the Federal Courts. Our classifier was trained

based upon decisions made by examiners at the Patent Office. It is thus natural

to ask whether such a classifier replicates the decision making of judges in the

Federal Courts. If it does not, then it is unlikely that our classifier can tell us

with any precision how many patents have been invalidated under Alice.

To validate the performance of our classifier, we evaluated claims from

post-Alice cases appealed to Court of Appeals for the Federal Circuit. The Patent

Office maintains a record of subject matter eligibility court decisions.234 From

this record, we obtained a list of patents that had been the subject of appeals

heard by the CAFC in the post-Alice timeframe. The list included seventy-seven

patents that were each associated with an indicator of whether the patent claim

at issue was held eligible or ineligible.235 We then pulled the relevant

independent patent claims from each patent in our list.236 Of these seventy-seven

claims, the CAFC held that sixty-three (82%) were not directed to patent-eligible

subject matter.237

After training, we next evaluated the Federal Circuit claims using our

classifier. Table 5, below, compares the performance of our classifier on test

claims drawn from our Patent Office dataset to its performance on the claims of

the Federal Circuit dataset. Note that the classifier was not exposed to any of

these claims during training. In the case of the Patent Office data set, the test

claims were held out and not used during training.

232. The ensemble discussed with respect to Table 4 has an MCC score of 0.485 while the Logistic

Regression classifier has a score of 0.448. While the ensemble is better, its use is not necessary to obtain a rough

estimate of the number of invalid patents.

233. From Plot 7, such a classifier has an ineligible recall rate of about 0.80 but an ineligible precision of

around 0.35.

234. Chart of Subject Matter Eligibility Court Decisions, U.S. PATENT & TRADEMARK OFFICE,

https://www.uspto.gov/sites/default/ files/documents/ieg-sme_crt_dec.xlsx (last updated Jul. 31, 2017).

235. Id.

236. For some patents, the Chart of Subject Matter Eligibility Court Decisions identifies the specific claims

analyzed by the court. For these patents, we pulled the first independent claim from the list of identified claims;

for other patents, we used claim 1 as the representative claim).

237. Chart of Subject Matter Eligibility Court Decisions, supra note 234.


Table 5: Classifier Performance for Patent Office and Federal Circuit Data

Test Dataset Class Precision Recall F-score MCC

Patent

Office

Eligible (n=3776) 0.96 0.77 0.85

Ineligible (n=563) 0.34 0.80 0.47

Average 0.88 0.77 0.80 0.412

Federal

Circuit

Eligible (n=14) 0.30 0.57 0.39

Ineligible (n=63) 0.88 0.70 0.78

Average 0.77 0.68 0.71 0.218

We can make several observations about the above results. First, for the

Federal Circuit data, we do not make much out of the recall and precision scores

for the patent eligible class, because the collection does not truly reflect the

landscape of litigated patent claims. Specifically, the Federal Circuit data is

skewed in the reverse of our training data. Our training data includes about 13%

ineligible claims, whereas the Federal Circuit dataset includes about 80%

ineligible claims. This should come as no surprise as there is a powerful

selection bias at work in the Federal Circuit dataset. In particular, the Federal

Circuit dataset only includes cases where the issue of subject matter eligibility

was raised at trial and ultimately appealed to the CAFC. It does not include

examples from the hundreds, if not thousands, of patent cases where the issue

was never even raised during the suit.238 There are thus many likely patent-

eligible claims that do not appear in the Federal Circuit dataset.239

We are most interested in the recall rate for the ineligible class. The recall

rate for ineligible claims reflects the classifier’s ability to find ineligible claims,

and thus, invalid patents. The classifier correctly identified about 70% of the

ineligible claims in the Federal Circuit dataset, which is not so different from the

ineligible recall rate of 80% in our set of Patent Office data. The classifier

appears also to do well in terms of ineligible precision in the Federal Circuit data

set, but to a large degree, this number just reflects the fact that the data is skewed

heavily in favor of ineligible claims.

In the end, the best we can say at this point is that our classifier reasonably

replicates the ability of CAFC judges to find ineligible patent claims. While the

Federal Circuit dataset is simply too small and skewed to draw any deeper

conclusions, this result at least gives us confidence that we can use the classifier

to roughly estimate the number of patents made ineligible under Alice, as

discussed further below.

238. At least 4,000 patent lawsuits have been filed in each year of the period 2012–2016. Jacqueline Bell,

Patent Litigation in US District Courts: A 2016 Review, LAW 360 (Mar. 1, 2017, 12:13 P.M.),

https://www.law360.com/articles/895435/patent-litigation-in-us-district-courts-a-2016-review.

239. Future work is directed to obtaining a higher quality Federal Court data set that includes claims from

the many litigated patents where the issue of eligibility was never raised, or was raised and answered in favor of

the patentee.


C. Evaluation of Issued Patent Claims

We next turned our classifier onto a 1% sample of the patents in our patent

document corpus. Our corpus contains about four million utility patents issued

between 1996–2016. The sample therefore contains about 40,000 patents. Plot

8, below, shows the predicted invalidity rate by year, as produced by our

classifier.

Plot 8: Predicted Invalidity Rate by Year

Plot 8 has a number of interesting features. As an initial matter, the graph

shows a predicted invalidity rate in excess of 10% for most years. This number

is too high for reasons that will be discussed below. At this point we are more

interested in year-over-year changes than in the exact predicted rate. First, the

graph shows a marked drop in the invalidity rate after 2014, which coincides

with the implementation of the Alice-review standards within the Patent Office.

Second, between 1996 and 2014, there is a clear upward trend in predicted

invalidity rate. This trend appears to be due at least in part to the rise of

computer-related industry sectors over time. As shown above, in Plot 4 and

Section IV, the Alice decision has disproportionately impacted computer-related

technologies compared to, for example, the mechanical arts. Over the last two

decades, the share of issued patents that are directed to computer-related

technologies has increased over time, which explains at least some of the rise in

predicted invalidity rate.240 This effect can be seen in Plot 9, below.

240. U.S. PATENT & TRADEMARK OFFICE, Patent Counts By Class By Year, https://www.uspto.gov/web/

offices/ac/ido/oeip/taf/cbcby.htm (last updated Dec. 2015).


Plot 9: Predicted Invalidity Rate and Classes G06F and G06Q Over Time

Plot 9 shows the predicted invalidity rate (top line) along with the yearly

share for classes G06F (digital data processing, middle line)241 and G06Q

(business methods, lower line).242 The yearly share is the fraction of issued

patents in a given class for a given year. Between 1996 and 2015, class G06F

increased its share from about 4% to about 11% of the total yearly number of

issued patents.243 During the same period, class G06Q similarly rose from a

share of less than 0.5% to a share of about 2%.244 Notably, the share of class

G06Q, where over 60% of the applications were rejected under Alice,245 declined

by about 50% after 2014.246 It is likely that this decline is a result of the

heightened scrutiny under Alice faced by applications in this class.

In order to estimate the number of patents invalidated under Alice, we need

an accurate estimate of the invalidity rate. Our classifier predicts an invalidity

rate for patents issued during the pre-Alice period of 1996-2013 of about 13%.

This number seems too high on its face. But this should not be a surprise when

we consider the precision and recall rates of our classifier. For the ineligible

class, the precision and recall rates are 0.34 and 0.80, respectively. The precision

score tells us that for every 100 claims classified as ineligible, only thirty-four

are actually correctly classified. The recall score tells us that the thirty-four

correctly classified claims only reflects about 80% of the total population of

ineligible claims. Thus, starting with the predicted invalidity rate of 13%, it

seems safe to say that 5% (≈ 13% x 0.34 / 0.80) is a more accurate number.

241. CPC class G06F is entitled ELECTRICAL DIGITAL DATA PROCESSING.

242. CPC class G06Q is entitled DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY

ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY

OR FORECASTING PURPOSES.

243. See supra Plot 9 (illustrating class G06F increased over time).

244. Supra Plot 9.

245. Supra Plot 4.

246. See supra Plot 9 (showing specifically, from 2.3% in 2014 to 0.8 % in 2015 and 1.2% in 2016).


Next, we estimate the total number of in-force patents issued during the

pre-Alice period prior to 2014. One accounting estimates that there were about

2.5 million patents in force in 2014.247 We reduce this number to 2 million

patents to exclude patents issued during 2014.248

Assuming a 5% invalidity rate and about 2 million patents in force at the

time of the Alice decision, we estimate that about 100,000 patents have at least

one claim that is likely invalid under Alice. How reliable is this estimate? It is

of course possible that the classifier overestimated the number of ineligible

claims. As discussed above, we have attempted to account for the classifier’s

bias by reducing the original estimate of 13% to 5%. Even if our process still

grossly overestimates (e.g., by a factor of two) the number of invalid patents, the

total would still number around 50,000 patents.

If anything, our process may be conservative compared to the application

of Alice by the Federal Courts. One study indicates that as of June 2016, over

500 patents have been challenged under Alice, with a resulting invalidation rate

exceeding 65%.249 An earlier study of over 200 Federal Court decisions showed

an invalidation rate of over 70%.250 Of course, these studies focus only on cases

where a defendant has moved to invalidate a patent under Alice, and is thus

skewed towards cases selected from suspect subject matter areas, such as

business methods, advertising, software, and the like.

The outcome of our analysis is even more profound in specific subject

matter areas. For example, over 80% of about 400 pre-2014 claims in CPC class

G06Q (data processing systems/methods for administrative, commercial,

financial, and managerial purposes) were classified as ineligible by the

classifier.251 There are over 50,000 issued patents in this class.252 If we scale,

as above, the 80% ineligibility finding to 35% to account for the classifier’s bias

towards invalidity,253 this still means that the Supreme Court may well have

invalidated over 15,000 patents in the class alone. Perhaps this impact was the

Court’s intent, although it certainly seems an extreme realignment of property

rights at the stroke of a judge’s pen.

247. See Dennis Crouch, The Number of U.S. Patents in Force, PATENTLY-O, https://patentlyo.com/

patent/2014/10/number-patents-force.html (last updated Oct. 25, 2014) (estimating that about 2.5 million U.S.

patents in force in 2014).

248. See U.S. Patent Statistics Chart Years 1963–2015, USPTO, https://www.uspto.gov/web/offices/ac/

ido/oeip/taf/us_stat.htm (last visited Jan. 28, 2018) (specifying that by 2014, the patent office was issuing

roughly 300,000 patents per year, which justifies reducing our estimate to 2 million from 2.5 million).

249. Tran, supra note 17, at 358.

250. See Sachs, supra note 3 (analyzing 208 Federal Court Alice decisions).

251. As a sanity check, the classifier processed a sampling of 3,731 pre-2014 claims from CPC class F02B

(internal combustion piston engines) and predicted only 184 (about 5%) to be patent ineligible. This result again

matches our intuition that claims that are mechanical in nature ought to be more likely to be patent eligible.

252. U.S. Patent Statistics Chart Years 1963–2015, supra note 248.

253. We are conservatively scaling the predicted rate by 0.4 based on ineligible precision and recall rates

of 0.34 and 0.80, respectively. Note that the justification for scaling in this context is not as strong as for the

average case (general population), because class G06Q undoubtedly contains a higher than average ratio of

ineligible to eligible claims.


VIII. CONCLUSION

We have shown that it is possible to predict, with a reasonably high degree

of confidence, whether a patent claim is patent eligible under the Alice test. This

prediction is based on thousands of decisions made by human patent examiners

charged with implementing the Alice test during the patent examination process.

The approach developed in this Article has many practical applications,

including providing support for patent practitioners and clients during the patent

preparation, prosecution, assertion, and valuation. Using machine intelligence

to identify Alice-related validity issues may yield economic efficiencies, by

diverting legal fees away from non-patentable inventions, by improving claims

and thereby streamlining the interactions between applicants and examiners, and

by reducing baseless litigation of invalid patent claims. Our use of machine

intelligence can assist and improve the analytic functions provided by attorneys,

while at the same time providing clients with better information for deciding

how to allocate and apply legal resources.

We have also used our machine classification approach to quantitatively

estimate, for the first time, how many issued patents have been invalidated under

Alice, thereby demonstrating the profound and far-reaching impact of the

Supreme Court’s recent subject matter eligibility jurisprudence on the body of

granted patents. Specifically, by invalidating at least tens of thousands of issued

patents, the Court’s actions represent a judicial remaking of patent law that has

resulted in a considerable realignment of existing intellectual property rights.

MECHANIZING ALICE AUTOMATING THE SUBJECT...

Documents

Transcript of MECHANIZING ALICE AUTOMATING THE SUBJECT...