Demystifying Advanced Technologies to Find Solutions that Work

37
Demystifying Advanced Technologies to Find Solutions that Work Friday, Oct. 11 | 9:45 – 10:45 Presented by

description

A Corporate Counsel headline from late last year asked, “Can Predictive Coding Save The World?” A better, albeit more modest question is, can it save you money? This panel addresses that loaded question and the related issues of: • Deploying advanced technologies across enterprise data, • Measuring the effectiveness of advanced technologies, • Vetting and selecting appropriate service providers, and Validating the results of predictive coding. In this panel, IT and legal experts survey the technology horizon, giving you insights and best practices for finding the solutions that work best for you.

Transcript of Demystifying Advanced Technologies to Find Solutions that Work

Page 1: Demystifying Advanced Technologies to Find Solutions that Work

Demystifying Advanced Technologies to Find Solutions that Work

Friday, Oct. 11 | 9:45 – 10:45

Presented by

Page 2: Demystifying Advanced Technologies to Find Solutions that Work

Peter Oesterling

Assistant General Counsel | Nationwide

Page 3: Demystifying Advanced Technologies to Find Solutions that Work

Alex Ponce de Leon

Discovery Counsel | Intel

Page 4: Demystifying Advanced Technologies to Find Solutions that Work

J. William Speros

Evidence Consulting Attorney | Speros & Associates

Page 5: Demystifying Advanced Technologies to Find Solutions that Work

“Technology-Assisted Review,” called by its nickname “Predictive Coding,” describes a process whereby computers are programmed to search a large amount of data to find quickly and efficiently the data that meet a particular requirement. Computer science and the sciences of statistics and psychology inform its use. While it bruises the human ego, scientists…determined that …[i]t is now indubitable that technology-assisted review is an appreciably better and more accurate means of searching a set of data.”

THE GROSSMAN-CORMACK GLOSSARY OF TECHNOLOGY-ASSISTED REVIEW

FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013) Foreword by John M. Facciola, U.S. Magistrate Judge

Page 6: Demystifying Advanced Technologies to Find Solutions that Work

“Technology-Assisted Review,” called by its nickname “Predictive Coding,” describes a process whereby computers are programmed to search a large amount of data to find quickly and efficiently the data that meet a particular requirement. Computer science and the sciences of statistics and psychology inform its use. While it bruises the human ego, scientists…determined that …[i]t is now indubitable that technology-assisted review is an appreciably better and more accurate means of searching a set of data.”

THE GROSSMAN-CORMACK GLOSSARY OF TECHNOLOGY-ASSISTED REVIEW

FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013) Foreword by John M. Facciola, U.S. Magistrate Judge

Process: a series of actions that produce

something or that lead to a particular result

Page 7: Demystifying Advanced Technologies to Find Solutions that Work

“Now, the methodology of the use of technology-assisted review may itself be in dispute, with the parties controverted to each other’s use of a particular method or tool. Those controversies have already lead to judicial decisions that have to grapple with a wholly new way of searching and with scientific principles derived from the science of statistics or other disciplines.”

THE GROSSMAN-CORMACK GLOSSARY OF TECHNOLOGY-ASSISTED REVIEW

FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013) Foreword by John M. Facciola, U.S. Magistrate Judge

Page 8: Demystifying Advanced Technologies to Find Solutions that Work

“Now, the methodology of the use of technology-assisted review may itself be in dispute, with the parties controverted to each other’s use of a particular method or tool. Those controversies have already lead to judicial decisions that have to grapple with a wholly new way of searching and with scientific principles derived from the science of statistics or other disciplines.”

THE GROSSMAN-CORMACK GLOSSARY OF TECHNOLOGY-ASSISTED REVIEW

FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013) Foreword by John M. Facciola, U.S. Magistrate Judge

Methodology: a set of methods, rules, or ideas that are important in a science or art : a particular procedure

or set of procedures

Page 9: Demystifying Advanced Technologies to Find Solutions that Work

THE GROSSMAN-CORMACK GLOSSARY OF TECHNOLOGY-ASSISTED REVIEW

FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013)

Predictive Coding: An industry-specific term generally used to describe a

Technology-Assisted Review process involving the use of a Machine Learning Algorithm to distinguish Relevant from Non-Relevant Documents, based on Subject Matter Expert(s)’ Coding of a Training Set of Documents.

Page 10: Demystifying Advanced Technologies to Find Solutions that Work

THE GROSSMAN-CORMACK GLOSSARY OF TECHNOLOGY-ASSISTED REVIEW

FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013)

Predictive Coding: An industry-specific term generally used to describe a

Technology-Assisted Review process involving the use of a Machine Learning Algorithm to distinguish Relevant from Non-Relevant Documents, based on Subject Matter Expert(s)’ Coding of a Training Set of Documents.

Page 11: Demystifying Advanced Technologies to Find Solutions that Work

“A word is not a crystal, transparent and unchanged, it is the skin of a living thought and may vary greatly in color and content according to the circumstances and the time in which it is used.”

Justice Oliver Wendell Holmes Jr., Towne v. Eisner, 245 U.S. 418, 425 (1918)

THE GROSSMAN-CORMACK GLOSSARY OF TECHNOLOGY-ASSISTED REVIEW

FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013) Foreword by John M. Facciola, U.S. Magistrate Judge

Page 16: Demystifying Advanced Technologies to Find Solutions that Work

Published as guest contributor to Ralph Losey’s E-Discovery Team Blog Site:

http://e-discoveryteam.com/2013/04/28/predictive-codings-erroneous-zones-are-emerging-junk-science/?shareadraft=517d80048f827

“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”

Page 17: Demystifying Advanced Technologies to Find Solutions that Work

“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”

• “PBS’ Frontline’s Forensic Tools: What’s Reliable and What’s Not-So-Scientific dispelled the infallibility, and in some instances, the validity, of analytical techniques long relied upon by our legal profession.”

• “Even if those techniques were not botched or biased, their validity ranges from bought-and-paid-for infomercials to, at best, an approximation.”

• “Back then attorneys and judges (and experts and vendors) did with those junk sciences just what we are doing now with respect to predictive coding: allowing claims, however unjustified and erroneous, to form the basis of our practices, to influence our precedent and to accrue authority.”

Page 18: Demystifying Advanced Technologies to Find Solutions that Work

“[T]hose of us who trust the scientific and adversarial process recognize that erroneous claims don’t naturally defeat truth. They suppress truth, distract from truth and sometimes persist so long that we forget to inquire into the truth. Oftentimes, weak interests seek to dispel erroneous claims which are promoted by strong commercial interests. With respect to predictive coding my sense is that we are neither deluded nor deceptive — well, not too much anyway — but we just have not yet thought it through.”

“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”

Page 19: Demystifying Advanced Technologies to Find Solutions that Work

“[T]hose of us who trust the scientific and adversarial process recognize that erroneous claims don’t naturally defeat truth. They suppress truth, distract from truth and sometimes persist so long that we forget to inquire into the truth. Oftentimes, weak interests seek to dispel erroneous claims which are promoted by strong commercial interests. With respect to predictive coding my sense is that we are neither deluded nor deceptive — well, not too much anyway — but we just have not yet thought it through.”

“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”

Page 20: Demystifying Advanced Technologies to Find Solutions that Work

Erroneous Practice #1

Using a full-text search to identify prospectively responsive documents and then employing predictive coding to eliminate those that are not responsive.

Erroneous Practice #2

Pulling a random sample of documents to train the initial seed set.

Erroneous Practice #3

Identifying “magic numbers” of minimum:• “Iterations”• Responsive documents within a

randomly accumulated setErroneous Practice #4

Asserting that Predictive Coding software is the “gold standard” for document retrieval in complex matters.

“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”

Page 21: Demystifying Advanced Technologies to Find Solutions that Work

Erroneous Practice #4

Asserting that Predictive Coding software is the “gold standard” for document retrieval in complex matters.

Is Erroneous Because

It asserts that predictive coding is a standard:• Share some commonly understood

characteristics but no precise attributes• Involves some general methodologies but no

clear rules• Are associated with general aspirations but

no comprehensively defined operations.Example All advertisements or orders for “predictive

coding”

“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”

Page 22: Demystifying Advanced Technologies to Find Solutions that Work

Erroneous Practice #4

Asserting that Predictive Coding software is the “gold standard” for document retrieval in complex matters.

Is Erroneous Because

It asserts that predictive coding is a standard:• Share some commonly understood

characteristics but no precise attributes• Involves some general methodologies but no

clear rules• Are associated with general aspirations but

no comprehensively defined operations.Example All advertisements or orders for “predictive

coding”

“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”

Page 23: Demystifying Advanced Technologies to Find Solutions that Work

Gold Standard vs “Standard”

Page 24: Demystifying Advanced Technologies to Find Solutions that Work

Erroneous Practice #2

Pulling a random sample of documents to train the initial seed set.

Is Erroneous Because

A. Looks for relevance in all the wrong places: Thoughtful researchers don’t try learn about relevant docs by examining irrelevant ones.

B. It turns a blind eye to what is staring you in the eye: denies that attorneys know what they are paid to know: where to look and what to find.

C. Measures the wrong stuff: • Constrained and circular “like” definition• Prevalence vs Relevance vs Probativeness

Example Global Aerospace v. Landow Aviation (settled without court ruling re strategy)

“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”

Page 25: Demystifying Advanced Technologies to Find Solutions that Work

Erroneous Practice #2

Pulling a random sample of documents to train the initial seed set.

Is Erroneous Because

A. Looks for relevance in all the wrong places: Thoughtful researchers don’t try learn about relevant docs by examining irrelevant ones.

B. It turns a blind eye to what is staring you in the eye: denies that attorneys know what they are paid to know: where to look and what to find.

C. Measures the wrong stuff: • Constrained and circular “like” definition• Prevalence vs Relevance vs Probativeness

Example Global Aerospace v. Landow Aviation (settled without court ruling re strategy)

“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”

Page 27: Demystifying Advanced Technologies to Find Solutions that Work

Erroneous Practice #1

Using a full-text search to identify prospectively responsive documents and then employing predictive coding to eliminate those that are not responsive.

Is Erroneous Because

A.Over-relies and under-delivers: presumed arrogance or clairvoyance

B.It arbitrarily places documents out-of-sight and, therefore, out-of-mind: likelihood that responsive documents will ever be produced but dumbing-down the predictive coding intelligence

Example In re: Biomet M2a Magnum Hip Implant Prods. Liab. Litig. (endorsed by court)

“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”

Page 28: Demystifying Advanced Technologies to Find Solutions that Work

Erroneous Practice #1

Using a full-text search to identify prospectively responsive documents and then employing predictive coding to eliminate those that are not responsive.

Is Erroneous Because

A.Over-relies and under-delivers: presumed arrogance or clairvoyance

B.It arbitrarily places documents out-of-sight and, therefore, out-of-mind: likelihood that responsive documents will ever be produced but dumbing-down the predictive coding intelligence

Example In re: Biomet M2a Magnum Hip Implant Prods. Liab. Litig. (endorsed by court)

“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”

Page 30: Demystifying Advanced Technologies to Find Solutions that Work

Erroneous Practice #3 Identifying “magic numbers” of minimum:• “Iterations”• Responsive documents within a randomly

accumulated setIs Erroneous Because A.You may not be able to get there from here:

Don’t know starting point or ending pointB.You don’t know what isn’t yet known: Cannot

predict alternative pathsC. Consider low frequency, high probativenessD.Who’s the witness?

Example • “This [iteration] process shall be repeated for a total of seven iterations… [Requesting party pays] costs and fees… [for] more 40,000 documents.” (DaSilva Moore)• Vendors’ affidavits in various matters

“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”

Page 31: Demystifying Advanced Technologies to Find Solutions that Work

Erroneous Practice #3 Identifying “magic numbers” of minimum:• “Iterations”• Responsive documents within a randomly

accumulated setIs Erroneous Because A.You may not be able to get there from here:

Don’t know starting point or ending pointB.You don’t know what isn’t yet known: Cannot

predict alternative pathsC. Consider low frequency, high probativenessD.Who’s the witness?

Example • “This [iteration] process shall be repeated for a total of seven iterations… [Requesting party pays] costs and fees… [for] more 40,000 documents.” (DaSilva Moore)• Vendors’ affidavits in various matters

“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”

Page 33: Demystifying Advanced Technologies to Find Solutions that Work

Erroneous Practice #1

Using a full-text search to identify prospectively responsive documents and then employing predictive coding to eliminate those that are not responsive.

Erroneous Practice #2

Pulling a random sample of documents to train the initial seed set.

Erroneous Practice #3

Identifying “magic numbers” of minimum:• “Iterations”• Responsive documents within a

randomly accumulated setErroneous Practice #4

Asserting that Predictive Coding software is the “gold standard” for document retrieval in complex matters.

“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”

Page 34: Demystifying Advanced Technologies to Find Solutions that Work

Search Mechanisms’ InferencesIn

fere

nces

(ris

k) re

reca

ll

Search Mechanism

Databases

Files, Folders(in place)

End-usertags

Files, Folders(per user)

Duplicates

“Technology Assisted Review”

via Machine Learning

E-mail threading and “Near” Duplicates

Key words

Random Sampling

Similarity/Clusters Sorting

Similarity

Clustering

Page 35: Demystifying Advanced Technologies to Find Solutions that Work

Your Notes

_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Page 36: Demystifying Advanced Technologies to Find Solutions that Work

Your Notes

_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Page 37: Demystifying Advanced Technologies to Find Solutions that Work

Your Notes

_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________