Advanced redaction whitepaper

6

description

 

Transcript of Advanced redaction whitepaper

Page 1: Advanced redaction whitepaper
Page 2: Advanced redaction whitepaper

 

Advanced RedactionTechnology: How to Provide Secure Access, Reduce Costs and Anticipate Future Legislative Requirements More than 11 million adult Americans were victims of Identity Theft in 2009, a 10% increase from 2008. The collective cost of these crimes was over $54 billion. Private information, such as Social Security numbers, within online public records can be vulnerable to cyber criminals. It is estimated that approximately 10% of Identity Theft cases originate with personal data collected from government records. A Growing Legislative Concern Across the United States, strict legislation is being passed requiring State and County governments to redact sensitive information, such as Social Security and credit card numbers from the official and public record. North Carolina, Pennsylvania, Iowa and Wisconsin recently passed records modernization laws mandating the redaction of Social Security numbers. In some states, forward-thinking local and state government officials have independently determined that it is their responsibility to protect constituents from identity theft.

Source information gathered from National Conference of State Legislators Updated May 2010

Page 3: Advanced redaction whitepaper

 

Redaction Defined Redaction, sometimes referred to as sanitization, is the permanent removal of personal or sensitive information from hard copy or electronic documents. The traditional technique of redacting confidential material from a paper document before its public release involves crossing out portions of text with a wide black pen, followed by photocopying the result. This manual processing of thousands or millions of document pages is a time-consuming process that can strain staff resources. As public records repositories shift from paper documents to electronic images, the challenges facing state and local governments are also shifting. Complying with state legislation and Federal “Sunshine” laws, such as the Freedom of Information Act and Openness in Government Act, requires a records management system, strategy and workflow that provide data security and accessibility. Deploying automated redaction technology is a powerful tool for securing personal data and maintaining public record access. Automated Redaction Technology As public records are scanned, the electronic images are processed through Optical Character Recognition (OCR) software that converts it into a digital format. This conversion allows the document to become “searchable” by a rules-based search engine that locates sensitive data within the OCR results. The search engine is powered by rules, clues, pattern recognition and algorithms designed to locate user-defined sensitive data types. After locating a sensitive data type, the software assigns a value that measures how well the data matches the pattern and clues. For example, the search engine may find a Social Security number by finding the clue “SSN” followed by a pattern of numbers such as 123-45-6789. This example falls into the “high confidence” range of results where the clue and the number pattern found by the search engine are an exact match. On another document, the search engine may find the clue “SSN” followed by eight numbers instead of the standard nine-character Social Security number. This result may be defined as “medium confidence.” These values or “confidence” classifications are used to streamline verification workflow. Accuracy Accuracy refers to a mathematical calculation involving the number of sensitive data fields found by the software compared to the total number of sensitive data types within the record. False positives occur when software locates non-sensitive data and marks it for redaction. This type of error is included in the overall accuracy rate. Accuracy is arguably the most important feature of automated redaction. Because no industry standard exists for calculating accuracy, evaluating and comparing the accuracy rates among redaction providers can be challenging. To help facilitate the

Page 4: Advanced redaction whitepaper

 

evaluation process, vendors’ accuracy formulas must be transparent and straightforward. If each sensitive data type undetected by the software represents a failure to protect a citizen’s private information, it stands to reason that the software’s accuracy rate should be downgraded for every occurrence of this type. Pre-verification accuracy calculates how well the software locates sensitive data automatically, without human intervention for quality assurance (verification). Achieving a high pre-verification accuracy rate is critical for two reasons. When redaction software automatically finds virtually all sensitive data within records, the security of individuals’ personal data is increased. Additionally, high pre-verification accuracy dramatically reduces labor costs. Verification An important part of any redaction workflow is verification or quality control. The two most influential factors that affect verification are the quality of the paper records before scanning and the targeted level of accuracy (higher accuracy requires more verification). Verification workflow is based on the particular needs of each client, and generally includes three options: 1) Fully Automated Redaction, where the software finds and redacts sensitive information automatically. 2) Semi-Automated Redaction allows a step for an end user to verify each redaction. 3) A Hybrid Redaction approach allows user-defined “high confidence” redactions to be automatically processed while lower confidence results are submitted for verification. Impact of Pre-Verification Accuracy on Labor Costs To demonstrate the relationship between software accuracy and verification labor costs, here is an example of a government office processing 40,000 image pages of records per month utilizing the Hybrid Redaction workflow. Software #1 has a pre-verification accuracy rate of 80% and Software #2 has an accuracy rate of 99%.

Verification Labor Costs: 40,000 Pages of Records/Month Using Hybrid Redaction Workflow

  Software #1  (80% Accuracy) 

Software #2  (99% Accuracy) 

Pages Processed/Day  1,905 1,905Pages to be Verified/Day  381 19Verification Labor in Hours/Day  1.5 0.08Verification Labor in Hours/Month  31.5 1.7Verification Labor in Hours/Year  378 20~ Annual Verification Labor Costs  $7,500  $400 

Page 5: Advanced redaction whitepaper

 

Selecting a Redaction Provider The redaction vendor selection process should consider 1) experience, 2) accuracy and 3) overall technology. 1. Experienced redaction providers have completed installations with many different types of records management software. The exposure to different systems helps seasoned customer support teams anticipate problems before they happen. Verification labor is often the highest cost within a redaction project. Working with a team experienced in verification workflow maximizes accuracy, minimizes human intervention and saves money. 2. The quality of paper records and the complexity of the data to be redacted have an impact on the accuracy that can achieved for each project. Under most circumstances, high quality automated redaction can achieve a pre-verification accuracy rate of 95%. 3. Redaction is an evolving technology. Top vendors are constantly adding new technology to improve accuracy and speed, and to meet the emerging needs of governments. Privacy and Information Security Regulations: What Does the Future Hold? The threat of unauthorized access to sensitive information within public records is unlikely to diminish in the near future. This proliferation may pave the way for additional federal and state data security measures. Government offices that are complying with existing regulations to redact Social Security numbers may face additional legislative mandates in the future that require the redaction of additional data types. In fact, this is already happening. In 2003, the Florida legislature mandated the redaction of Financial Account information including bank, credit and debit card numbers. Similarly, the Nevada legislature issued a revised statute in 2006 to mandate the redaction of Drivers’ License numbers, Identification Card numbers and Financial Account information including bank, credit and debit card numbers. Government agencies can successfully navigate the redaction of additional data fields in the future by leveraging today’s technology. As records are being processed, reports can be created that identify specific documents that contain the additional data fields (credit card number, drivers’ license number) that may need to be redacted in the future. This captured information can be used to create a budget for the additional verification process and to isolate suspected images for automatic/manual redaction processing. The passage of time presents some problems for this approach. Documents change and data capture tools and techniques improve rapidly. Using rules and capture technology from a previous project may decrease accuracy and/or increase verification labor costs. To maintain accuracy and keep manual labor costs low, a better solution may be to save the OCR output from the original project to avoid incurring the cost of rescanning, and write new custom rules for subsequent mandates as they arise.

Page 6: Advanced redaction whitepaper

Conclusion At a minimum, redaction software can help government agencies make public information available in a secure manner. Advanced technology can be harnessed to save labor costs and eliminate a significant percentage of tedious data entry tasks. Government agencies can gain significant, ongoing benefits from selecting a software partner with leading edge technology. ID Shield Redaction Software ID Shield is a proven, cost-effective redaction solution that permanently removes private information within records and documents. ID Shield Redaction Software customers have redacted over one billion images. Extract Systems offers server-based and desktop redaction software. About the Author

Mark Miller is Vice President of Sales for Extract Systems, a leading provider of best-in-class data capture and redaction software. Extract’s products are built to adapt and integrate into any type of environment, providing flexibility, scalability and efficiency. The productivity gains achieved with Extract Systems’ data automation solutions save organizations money, improve workflow and eliminate paper. For more information, please contact: Extract Systems, LLC 6418 Normandy Lane, Suite 200 Madison, WI 53719 Phone: (877) 778-2543 or (608) 216-7950 E-mail: [email protected] www.extractsystems.com  Sources: Javelin’s 2010 Consumer Identity Fraud Report National Conference of State Legislatures