Evidence in Large Document Reviews Best Practices to … · Best Practices to Avoid Missing Key...

37
Erin Derby, ACEDS Lexbe Best Practices to Avoid Missing Key Evidence in Large Document Reviews How Proper or Improper Search, Processing and Indexing can Make or Break Your Case

Transcript of Evidence in Large Document Reviews Best Practices to … · Best Practices to Avoid Missing Key...

Erin Derby, ACEDSLexbe

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

How Proper or Improper Search, Processing and Indexing can Make or Break Your Case

eDiscovery Webinar Series

We are an Austin, TX based eDiscovery software and services provider, specializing in serving small & medium-sized law firms and organizations. We provide:

● Cloud-based DIY eDiscovery processing & document review software

● High-speed ESI document processing and data conversion services

● Experienced eDiscovery specialists and expert consultants

Lexbe Sales [email protected]

(800) 401-7809 x22

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

‘Cost-effective eDiscovery’ “A powerful litigation document management service”

‘Secure, easy-to-use and a great review tool for consideration’

About Lexbe

● Webinars take place monthly and cover a variety of relevant e-Discovery topics

● If you have technical issues or questions, please email [email protected]

● Lexbe webinars are available for viewing (streaming video), and downloadable as a PDF Presentation or an MP3 podcast. This Webinar and a complete listing of other onDemand webinars is part of the: Lexbe eDiscovery Webinar Series

● For notices of future live and on-Demand webinars as part of this series please email us at [email protected] or: Follow us on LinkedIN

eDiscovery Webinar Series

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

About our Webinars

● eDiscovery Specialist at Lexbe, specializing in working with clients handling eDiscovery in complex litigation, requiring a high level of precision and expertise.

● Certified by the Association of Certified E-Discovery Specialists (ACEDS).

● Erin is also a Litigation Paralegal with 10 years experience with both plaintiff and defense law firms in a variety of practice areas.

Erin Derby800-401-7809

(512) 956-5594 [email protected]

eDiscovery Webinar SeriesErin Derby bio

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

● Overview Modern Search and eDiscovery Indexing Technologies

● Basic and Advanced Search Options in Use Today

● Search Indexing 'Gotchas'

● Pitfalls of Relying Solely on Image-Based OCR Indexing

● Why Native Extraction Alone is Ineffective

● Complexities of Working with Foreign Language & Translated Text

● Optimal Search with a Concatenated Indexing Approach

● Takeaways

Agenda

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Avoid Missing Key Evidence in Large Document Reviews

● Early Stage Culling - Reduce amount of ESI for review by using keywords, date ranges, custodians to cull document collections.

● Keyword-Based Responsive and Privilege Review - Construct search queries to return documents that are likely to be responsive and/or confidential.

● ID Documents for Depo Prep - Find and assign key documents related to specific case participants to prepare for depositions.

● ID of Key Docs for Trial - Find and mark key case documents. Code documents that will be needed for trial.

Use of Keyword Search in Discovery

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Best Practices: eDiscovery Search

● Start with Request for Production or Subpoena - Translate the demands of the RFP/SDT into a keyword search strategy.

● Interview Custodians - Ask key case participants/data custodians about their ESI. Use their insights and their terminology to find obscure key documents.

● Include Jargon - Seek out industry or company sub-culture specific terms and abbreviation/acronyms you may not be familiar with.

● Included Misspellings - Include misspelled versions of keywords or (use “fuzzy search” settings or Boolean limiters) in your search string to account for emails or other words with typos.

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Construct Quality Searches Best Practices: eDiscovery Search

● Fast - Keyword search is very fast compared with other document search methodologies.

● Inexpensive - Good results are obtained at little cost compared with manual review or other computer-assisted methodologies.

● Quality - Search can deliver high quality results, particularly if keyword terms are carefully developed and tested.

● Avoids Manual Review Errors/Inconsistencies - Search results are computer generated and avoid known human review errors that can result from fatigue and inadequate training or lack of focus.

Pros of Keyword Searching

Best Practices: eDiscovery Search

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

● Search Can be Over or Under-Inclusive - Search terms can bring back too many junk results or miss good results. These are known as “false positives” and “false negatives.”

● Difficulty of Creating Good Search Terms - Constructing good search terms takes design time, testing, iterations and analysis.

● Non-Searchable Text - Search results are only as good as the underlying searchable text. ESI collections and review tools can miss text that a human reviewer might catch for a variety of reasons.

● Some file types cannot be indexed - There is little consistency in what files can be indexed across databases.

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Cons of Keyword SearchingBest Practices: eDiscovery Search

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Search Limiters Reduce False Positives (Noise)

● Filter Out Unneeded File Types. Some file types are unlikely to lead to useful information and can be excluded.

● Use Boolean Modifiers to Limit Overly Expansive Searches.○ Boolean modifiers can reduce the number of documents

returned from a query while increasing the relevance of those files.

○ Exclude certain words or combinations and specify word order.

Using Search Limiters

Advanced eDiscovery Search

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Basic Boolean Operators:● AND: returns results including both terms● OR : looking for at least one of a list of terms● NOT : exclude terms you don’t want● ( ) : can be used to separate OR statements from the rest of

the Boolean string● PRE/n : First search term does not precede the second term

by more than n words● Wildcard Characters: ‘*’ replaces a letter in your search term,

‘!’ allows for stemming search within a Boolean query

Using Search Expanders

Advanced eDiscovery Search

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Search for PII ● Social Security Numbers Search Options

○ Use Boolean option and wildcard '==' as a sequence of 09 digits:

○ 1- Wildcard "=== == ====" (spaces)○ 2- Wildcard "=========" (w/ spaces)○ 3- Wildcard "===-==-====" (dashes)○ 4- Search "SSN" ○ 5- Search "Social Security Number"

● The same logic can be used for credit card numbers, birthdates, phone numbers, or any other PII needing to be redacted from documents prior to production or Court filings.

Finding Personally Identifiable InformationAdvanced eDiscovery Search

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Search for PII

Phone Numbers “=== === ====”

Finding Personally Identifiable InformationAdvanced eDiscovery Search

Advanced eDiscovery Search

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

● Look at Results Returned. Searching without review and testing results in low quality results.

● Sample and Look for Ways to Limit Search. Create new queries that reduce false positives.

● More new keywords. Viewing search results prompt the discovery of additional keywords that could be used to expand or reduce search queries.

● Fuzzy and Concept Search. Find new keywords by searching and returning synonyms and near identical words.

Keyword searching becomes an iterative process.

Test Keyword Searching Results

Advanced eDiscovery Search

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

● Understand Data Collection. Understand composition of collection, file types, special issues. Not everything is ready to be indexed/searched. Some data will need processing.

● Understand Software Used for Processing & Search - Indexing tools create database entries for files by opening, scanning, and storing specific content in ‘fields’, often based on the data’s location within the document, e.g. header data fields and body content fields.○ If indexing tools can’t open a file or see specific forms of

data within the file, it will not be able to index it and the data will not be searchable.

Ways to Improve Searchability

● Archive/Container Expansion

● File Repair

● Metadata extraction & fielding

● MD5 hash code generation

● System file identification & DeNIST

● Deduplication

What happens when you load documents to a processing &/or review platform

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Search and eDiscovery Indexing Technologies

● Email attachment extraction & parent email association

● Time Zone Offset for Emails

● Custodian Assignment

● Native text extraction

● OCR of images

● Indexing of extracted & OCRed text

What is an index and why is it important?

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Search and eDiscovery Indexing Technologies

● The purpose of storing to an index is to optimize speed and performance in finding relevant documents for a search query.

● Without an index, the search engine would scan every document in the corpus, which would require considerable time and computing power.

● Modern eDiscovery engines are able to parse out “noise” or “stop” words, which greatly increases the search speed.

How OCR WorksAn OCR index receives each document as though it were a scanned image. The image is run through a virtual print driver and inspected for alphanumeric text. Any identified text is then lifted and indexed, making it searchable in your review platform.

OCR vs Text-based

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Search and eDiscovery Indexing Technologies

Not necessarily 100% accurate. Depends on the quality of the image, layout, fonts, spacing, etc.

Advanced search practices can help mitigate accuracy issues.

OCR Software distinguishes light and dark patterns to determine characters that match alphanumeric symbols.

Optical Character Recognition (OCR)

● Imaged Based: Search results are generated from the image of documents.

● Native files (email, attachments, spreadsheets, etc.) are converted to a paginated image file and then OCR is applied to make the text searchable. (ex. TIFF production with no extracted text).

● Conversion software uses a ‘print-driver approach’ to virtually image what would have been physically printed

● Data that could be missed - Headers/footers/notes, comments and revisions, highlighted text, hidden sheets or text, print selections, applied filters

OCR vs Text-based

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Search and eDiscovery Indexing Technologies

Text-based IndexText is extracted from a native document

● Any text typed into the document is indexed- this includes; hidden fields, comments, hidden sheets, BCC fields, etc.

● Text would have to be typed into the document(s) not appear on an embedded image (a picture of a Stop sign would not add the word “Stop” to the index)

● Data that could be missed - Non-text files (ex. scanned documents) and embedded text, objects, or visuals will not be indexed. Different native extraction methods can also vary in their ability to recognize certain types of text.

OCR vs Text-based

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Search and eDiscovery Indexing Technologies

OCR Index

● Pros: Embedded Images○ Charts○ Budgets○ Scanned Documents

● Cons: Hidden Fields○ Excel hidden cells○ Comments○ Tracked Changes○ BCC Field

OCR Index vs Text-based Index

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Search Indexing Gotchas

Text-Based Index

● Pros: Hidden Fields○ Excel○ Comments○ Tracked Changes○ BCC Field

● Cons: Embedded Images○ Charts○ Budgets○ Scanned Documents

These indexes are each powerful, but the strength of one is the weakness of the other. Which do you choose?

Hidden Sheets- OCR Con / Text-based Pro

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

OCR Index vs Text-based Index

SEARCH Joe Connor

Searching “Joe Connor” would not return this document as a hit if searching in a typical OCR index.

Hidden Sheets- OCR Con / Text-based Pro

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

OCR Index vs Text-based Index

The native document shows the hidden sheet where “Joe Connor” appears

Hidden Sheets- OCR Con / Text-based Pro

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

OCR Index vs Text-based Index

Searching “Joe Connor” in a Text based index does yield that same document as a hit.

OCR Indexes miss hidden fields/ pages like this one.

Embedded Images- Text-based Con / OCR Pro

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

OCR Index vs Text-based Index

All of the text that appears in the PowerPoint slide is on an embeded image and so is not “readable” by a Text-based index.

SEARCH NW Alum Smelters

PowerPoint Slide with Embedded Image Text Based Index

Embedded Images- Text-based Con / OCR Pro

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

OCR Index vs Text-based Index

● The OCR index is able to lift the text off the document.

● Above we see the hit returned when “NW Alum Smelters” is searched in an OCR Index.

● Extracted Text-based indexes miss embedded or attached images.

Invisible fields with important information

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Metadata Index

Metadata is field-level file information used in review and often delivered with a production as part of a load file or retrieved from Native Files. Metadata is information about your data and can produce very valuable evidence.

Metadata Field Name Use in Review

DocOriginalTitle The original title of a document

Date/Time Sent, Date/Time Received Show when emails sent and received

Sender, Recipients, CC and Bcc Show who sent and received emails

Date Last Modified Usually best date field for files collected normally (without a forensic collection)

File Extension / File Type Show type and quantity of ESI produced

Complexities of Translating Documents

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Foreign Language and Translated Text

Choices to make when you have foreign language documents:

● Machine Translation (not perfect, but more affordable)● Human Translation (can be very good, but expensive)● Hire foreign language staff for search (can be very good,

but expensive)● Create foreign language search terms and have only

responsive documents translated (not perfect, but more affordable)

You find yourself asking: “What might I miss?” vs “How much money am I wasting”

Further complicating the matter: Tying translated documents to original for production often has to be done manually.

● Blends the OCR and Text indexes into a single searchable database

● Regardless of whether the text appears in a hidden field (Text based) or an embedded image (OCR based) your document will be responsive to search

● Lexbe’s uber index seamlessly blends the OCR and Text-based Index. Additionally includes the translation and metadata indexes into its Uber Index○ Apply advanced Boolean search to pull text from

either OCR or Text-based and get better results

○ Translated documents can be searched in either language and all of the search expanders can be applied (fuzzy, concept, etc.)

Combines 2 or more Indexes

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Concatenated Index

Blending all available Indexes

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Concatenated Index

Having only one index would miss information on a document like this. A concatenated index would pull all of the text, on the embedded image and in the comments, into a single searchable database.

Lexbe Document Viewer

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Uber Index

All of these tabs allow you to easily toggle between different views of the document

Multi-Language Search Results

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Uber Index

“A Horse” is searched, this document is identified as a hit with “caballo” being identified as a translated term.

Translation Tab

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Uber Index

The Lexbe Document viewer shows both the translated and original version in the doc viewer for easily toggling.

● Many platforms offer a single index, OCR or Text○ What might you miss?

● Some platforms offer both, but you have to search separately○ More time spent searching

● Metadata can contain crucial evidence ○ Ensure that your platform receives and indexes metadata

in a way that is searchable

● Foreign language documents add another level of complexity○ Does the platform offer auto language detect?○ Will the platform pull the original and translated version

together?

Considerations

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Evaluating your Platform Index

Avoid Missing Key Evidence in Large Document Reviews

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

● The quality of your search index can directly affect the outcome of your case

● Foreign language documents can add a level of complexity, especially if your platform is not set up to receive translated documents properly

● The Lexbe Uber Index helps ensure that you do not miss any evidence by capturing all text, embedded on an image (OCR) or typed (Text-based), as well as pulling in metadata and translated text into a single searchable index.

Summary

Lexbe’s Concatenated Index

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Uber Index

Four Critical Data Sources Feed the Lexbe Uber IndexSM Resulting in the Industry’s Most Complete Index and Search Platform.

We’ll be making the following available to webinar attendees:

● A recorded streaming version● MP3 podcast● PDF

Please let us know if you have any questions or comments about this webinar or suggestions for future topics. This webinar is part of the Lexbe eDiscovery Webinar Series. For notices of future live and on-Demand webinars as part of this series please email us at [email protected] or Follow us on LinkedIN.

Thank You For Attending

Best Practices to Avoid Missing Key Evidence in Large Document Reviews

Thank You

Presenter: Erin [email protected](512)956-5594

Moderator: Frank [email protected](512) 649-2440

Webinar Questions: [email protected]