Data Protector Suite · Today, you can assign permissions to data (file) sources, IRI masking...
Transcript of Data Protector Suite · Today, you can assign permissions to data (file) sources, IRI masking...
Data Protector Suite
IRI, The CoSort Company
Vendor Background
● Specializing in fast data management and data-centric security
● Privately owned and profitable since 1978
● Sales and support in more than 40 cities worldwide
● 8 of 9 products share 1 metadata and Eclipse IDE
● Featured in: CIO Review (top GRC and Compliance vendors);
DBTA; Gartner Market Guide to Data Masking Tools; and in the
QY, Markets & Markets, and Research & Markets forecast reports on
Data Masking, DB Security, Data Classification, Data Governance
Selected IRI Data Masking Customers & Partners
Most IRI data masking customers profile and protect PII in databases, flat files and Excel sheets on premise, or in the cloud. Recent engagements also involve NoSQL DBs, documents, images, and faces. Streaming and Hadoop sources are also supported. Sites doing IRI mask/test work include:
IRI Data Masking Tool Architectures
© 2019 Innovative Routines International (IRI), Inc. All Rights Reserved.
Voracity data masking, cleansing, transformation, migration, reporting, and wrangling jobs can be created and run inside or outside of IRI Workbench.
Job design methods supported inside:
1) Job creation wizards
2) Color-coded syntax-aware job
script editor with outline
3) Form Editors
4) Graphical parameters Dialogs
5) Mapping Diagrams
Job design methods supported outside:
6) Erwin Mapping Manager
7) Any external text editor
8) 3GL app (system or API calls)
Multiple Masking Job Design Options
© 2019 Innovative Routines International (IRI), Inc. All Rights Reserved.
1) 4GL scripts on command line or in batch.
2) From 3rd party automation tools like Stonebranch UAC, cron, etc.
3) Directly in BIRT or KNIME in Eclipse, or a Splunk add-on app, as you report or index.
4) Some jobs run without code changes in Hadoop via MR2, Spark, Spark Stream, Storm or Tez.
5) Use graphical run configuration dialogs or the built-in task scheduler to launch local, remote, or HDFS jobs from IRI Workbench
6) System or API calls from 3GL programs
And Multiple Job Deployment Options
IAM/RBAC Now & Later
Today, you can assign permissions to data (file) sources, IRI masking programs (sortcl.exe), and the scripts they run (spec.fcl) in LUW file systems using central LDAP/AD settings. You can optionally control them via Apache Directory Studio in IRI Workbench:
Soon, the IRI client/server governance system illustrated to the right will assign and enforce RBACs to the same elements and to more granular elements like field names (mapped from data classes), functions, and even specific data values or ranges of values.
Cloud Data & Systems SupportFieldShield can read/mask/write data in cloud databases like Snowflake, MS SQL in Azure, AWS Redshift, et al, via J/ODBC, and data streaming through URLs. DarkShield will support cloud sources like Amazon, Google, FB, et al soon (but right now it’s local/SMB only). Both operate on local, remote, or cloud systems running Windows, Linux or Unix.
Metadata Integrations1. Voracity tooling consumes metadata from any structured source for data classification, profiling, search, de-ID, ETL etc.2. FieldShield & RowGen job scripts also produce metadata for several DB load utilities in multi-DB masking & test data jobs.3. Their data definition file metadata can also be exported (e.g. target field layouts) in CSV for catalog tools like Collibra.4. DarkShield reads attribute metadata about source files, and produces artifactual metadata from its search and mask ops
and it can auto-forward or populate Splunk ES with that information for analysis, dashboarding, or adaptive responses.5. MIMB and Erwin mapping manager can hub and feed FieldShield DDF and .FCL specs based on third-party metadata:
6. All IRI metadata -- including data source/target layouts, job/task speci and batch files, worfklows and metamodels, discovery configurations, search matchers and masking rules -- can also be team shared, secured and version controlled in Git et al
Data Protector Suite
IRI Data Protector Suite
What FieldShield Does
● Connects and interacts with multiple sources and targets, on-premise or cloud
● Discovers and classifies sensitive data in DB, flat-file, and dark-data sources
● Protects fields with PII, PHI, etc. via 14 built-in masking function categories
● Addresses multiple protections and recipients in one job script, one I/O
● Applies masking rules across tables to preserve referential integrity
● Secures data conditionally, i.e. based on patterns, values, or ranges
● Delivers data for portability and redact it for GDPR compliance
● Masks DB data dynamically using C/C++, Java, or .NET SDK functions
● Retains data realism (e.g. FPE and pseudonyms) for testing and outsourcing
● Masks inside big data BI/analytic, ETL, migration, sub-setting, and test data jobs
● Determines statistical likelihood of re-identification (risk scoring) for HIPAA compliance
● Logs job and system runtime details to an XML audit file to verify compliance
● Supports streaming input and Hadoop execution paradigms within Voracity
IRI FieldShield
Search ScoreMask Audit
FieldShield Data Sources (Standard)
Acucobol Vision Delimited Line Sequential SQL Server
Altibase (FACT) Derby (WB) MaxDB SQLite
ASN.1 TAP3 ESDS MF-ISAM Sybase ASA/E & IQ
BIRT DB (WB) Excel (WB) WF Var. Length Tibero (WB)
BIRT Hive (WB) ELF web logs MySQL Teradata (WB)
BIRT JDBC (WB) Fixed Oracle Text
BIRT POJO (WB) Heap / print Outlook (WB/DS) UTF-8 & 16
C-ISAM HSQLDB (WB) PDF (WB/DS) Variable Block
CLF web logs IDX 3, 4 & 8 PostgreSQL/EDB Variable Sequential
CSV Informix Powerpoint (WB) VSAM MVS (UniKix)
DB2 (UDB) Ingres Record Sequential Web Services (WB)
DB2 for i5/OS (WB) JSON RTF (WB) Word (WB)
DB2 for z/OS (WB) LDIF SQL Anywhere XML
FACT: requires IRI Fast Extract (FACT) DS: requires IRI DarkShieldWB: requires IRI Workbench, the free Eclipse GUI for FieldShield, etc.
FieldShield Data Sources (Legacy)
Accessible via IRI partner (CONNX) J/ODBC drivers
Access D3 GA-Power 95, R91 K-ISAM Pathway RMS
Adabas Datacom Gemstone Knowledgeman PDS Reality/X
Advanced Pick Dataflex GENESIS KSDS PervasiveSQL RRDS
ALLBASE Db4o Gigabase Lotus Pick/Pick64+ SAP HANA
Alpha5 dBase H2 Manman PI-Open Sequoia
Amazon RDS Desktop Adapter IDMS Mentor / pro Powerflex Sharebase
Azure DL/1 IDS MO Powerhouse Supra
BizTalk DSM Image Model 204 Progress Terracotta
Cache Enscribe IMS Mumps QueryObject Total
Clipper Enterprise Adapter Interbase MyBase rBase Ultimate
Codasyl FileMaker Intersystems Netezza R83 UltPlus
CorVision Firebird ISM NonStop SQL Rdb Unidata
ConceptBase Focus Jasmine ObjectStore REALITY Universe
D-ISAM FoxPro JBase Paradox Red Brick VSAM VSE
FieldShield Data Sources (Modern, in Voracity)
Amazon EMR Hive FinancialForce Marketo Pivotal Greenplum
Apache Cassandra Force.com apps MongoDB Pivotal HD Hive
Apache Hadoop Hive Hortonworks Hive MS Dynamics CRM Salesforce.com
Cloudera CDH Hive Hubspot MS SQL Azure ServiceMAX
Cloudera Impala Lightning Connect Oracle Eloqua Spark SQL
Database.com MapR Hive Oracle Service Cloud Veeva CRM
Sensitive Data Discovery - Multiple Wizards
To facilitate data masking, IRI FieldShield includes: PII definition (cataloging through data classes); discovery through string (literal or dictionary), pattern, and fuzzy-logic searches; statistical reporting; and, automatic metadata creation.
Fit-for-purpose GUI wizards deliver:
● DB and file data classification, with search and masking rule matchers
● DB profiling, ERDs, and table searches● Flat-file profiling and value searches● Data class searches through schema
and directories for bulk discovery● Metadata discovery and definition● Dark data search and structuring, with
metadata reporting (see DarkShield)
IRI FieldShield
Static Data Masking Functions (1-3 of 15)
Encryption / DecryptionCharacter Scrambling
● 3DES EBC & SSL● AES-128 & -256 CBC● AES-256 Format-Preserving● GPG (PGP-compatible) ● FIPS-compliant OpenSSL● Custom
● For ASCII data● Less secure● Reversible
Encoding / Decoding
● Converts binary to ASCII● Supports base64 & hex● Reversible
IRI FieldShield
● Random data generation● Random data selection● Non-reversible
Static Data Masking Functions (4-6 of 15)
Pseudonymization Redaction / Obfuscation
● Provides realistic names● Reversible lookup values● Non-reversible selection
● Partial/full-field masking● Conditional omission● Non-reversible
Randomization
IRI FieldShield
User’s field-level call
Static Data Masking Functions (7-15 of 15)
Hashing Expression Logic
● SHA-1 & 2 cryptographic● Returns hash of fieldstring● Use for integrity checking
● Mathematical operations● PCRE logic● Custom blurring
String Manipulations
● Find, replace, and add● Reposition and trim● Use INSTR information
Blurring & Bucketing
Add random “noise” (perturbate) to ages/dates, and generalize (anonymize) quasi-identifiers
TokenizationDB-value substitute for PCI DSS
Custom Functions
IRI FieldShield
Deletion & SuppressionErasure for GDPR Right to Be Forgotten
Query-Ready XML Audit LogIRI FieldShield
Re-ID Risk Determination
IRI FieldShield
US HIPAA and FERPA regulations require that patient and student data sets used in research or marketing have a statistically certified “very small” chance of being re-identifiable.
● IRI risk scoring wizard produces re-ID probability scores in 3 modes
● Analyzes quasi-identifiers with multiple, peer-reviewed functions
● Detail and graphed scoring reports
MongoDB MaskedIRI FieldShield
unmasked
Masking et al in Hadoop, tooIRI FieldShield in Voracity
Map once, deploy anywhere
Dynamic Data Masking OptionsMethod Operation
ODBC Select / Update Apply FieldShield column masks to target (view) tables for specific users/rows
DB App Invocation Use .NET or Java SDK library functions or system-call job scripts on the fly
In-Situ Redaction User and SQL-specific full and partial column masking on query
Custom I/O Procedures Drive real-time application data directly to/from FieldShield jobs in memory
Real-Time Processing Hadoop Spark and Storm processing of dynamic input streams (via Voracity VGrid)
Governance Mode New runtime facility tied to RBAC/IAM infrastructure masks fields for some users
Encryption Key Management Options1. Passphrase (key string) embedded in script2. String as environment variable3. String in (securable) key file4. Multi-factor authentication via Azure Key Vault or Townsend Security
Alliance Key Manager (which also features VM and HSM supports)
IRI FieldShield
Masking Complex XML/HL7 IRI FieldShield
This method has FieldShield masking of structured data flattened into CSV subsets by the Sonra Flexter pre-parsing tool.
Alternatively, just use IRI DarkShield.
User Profiles
● Vertical industries and governmental agencies storing, processing, or outsourcing applications with sensitive data, such as:
○ Banks○ Census / Tax○ Defense
○ Health Care○ Insurance○ Schools
● Application, DB, and DW users handling sensitive data● CISOs, compliance teams, consultants, IT managers, and solution architects
IRI FieldShield
Use Cases
Tesco Bank/RBS UK○ Decrypt and re-encrypt fields in credit card migration and test files○ Generate and manage encryption and user ID keys○ Other projects protect 38,265 records per minute on Windows
Accenture Singapore○ Design and run encryption and masking jobs on Linux servers○ Secure PHI for the Ministry of Health Holdings (MOHH)’s Oracle DB○ Row sequencing and job audits
Medicx Media Solutions USA○ Encryption and hashing functions to PII and PHI in geo-medical
consumer health databases○ Exceeds HIPAA requirements in provisioning mScoresTM data
to digital and direct marketers
IRI FieldShield
Key DifferentiatorsDeveloper Support
○ Version controls○ Master data definition○ Secure key management○ Git project management (teaming)○ SDK supports .NET and Java calls○ Data profiling and metadata discovery○ XML (and soon JSON) job logs, IAM
One-Stop-Shop
○ Integrated data classification & search○ Includes re-ID risk scoring for HIPAA○ Use w/Voracity ETL, migrate, cleanse ○ Metadata-compatible with RowGen TDM○ Used in DB subsetting wizard○ Also works in Voracity BI & KNIME jobs○ Runs wActifio DB clones, Splunk ES, etc.
Price Performance
○ The data-centric security tool with:➜ The most sources➜ The most protection functions➜ The most target file formats
○ Fastest standalone protection software
Ease-of-Use
○ Familiar Eclipse GUI○ Self-documenting 4GL syntax○ Easy management and modification
of jobs/metadata
IRI FieldShield
Competitive Advantages vs. IBM
○ FieldShield scripts simpler than Optim interoperability model and Javascript options
○ Seamless integration with more sources○ More functions○ Lower cost
vs. Informatica○ FieldShield DDM inclusive with product
(compared to Informatica’s upgrade)○ More SDM protection functions○ Integration with Eclipse and Excel○ Access to 4GL scripts○ Lower cost
vs. CA (Grid Tools)○ Built-in CoSort engine makes FieldShield
faster than GT Fast Data Masking○ Tight integration with data profiling, ETL,
data quality, and BI operations○ Multi-target/format options○ Lower cost○ Built-in re-ID risk determination wizard
vs. Imperva (Camoflauge)○ FieldShield has more masking and
encryption functions○ Hash, decode, and pseudonymize
functions○ Faster and more extensible in the IRI
Workbench IDE○ Lower cost
IRI FieldShield
vs. Oracle (click)
IRI Data Protector Suite
What CellShield EE Does
● Discovers, reports, and masks PII and perform audit actions in Excel 2010 & later
● Searches and secures PII in spreadsheets on one PC or throughout an SMB LAN
● Provides common and allow new search pattern definitions for PII formats
● Searches for strings in a dictionary, and find/fix PII floating in cells
● Supports reuse and sharing of patterns in project or cloud repositories
● Generates a report of all patterns found and open it for action in a worksheet
● Opens applicable worksheets and highlights the located ranges for protection
● Encrypts, redacts, or pseudonymizes in one-pass with chosen functions and options
● Reveals data with the decryption key, or if reversible pseudonymization was used
● Overlays results directly into the affected cells, or in another worksheet
● Moves between, or bulk-remediates all, identified worksheets and ranges
● Auto-inserts protection details into an un-editable audit column in the report
IRI CellShield
Search Extract Mask Report
CellShield PII DiscoveryThe dark data profiling wizard in the IRI Workbench searches network-wide for sensitive data in spreadsheets based on user-specified (plus popular and saved) Java regular expressions (patterns):
IRI CellShield
CellShield ReportingThe report produced by the profiling wizard opens in a dynamic worksheet supported by an action dialog for protection and auditing activities:
IRI CellShield
CellShield ProtectionPerform point-and-click encryption and decryption, masking (full or partial cell), or pseudonymization (reversible and non-reversible) of the applicable ranges within the spreadsheets in the report:
IRI CellShield
CellShield Intra-Cell Search & MaskFeature finds and fixes floating PII, ad hoc, or en masse
IRI CellShield
CellShield AuditingAn uneditable log entry for the protection applied to each pattern identified in the report is automatically appended on each action:
IRI CellShield
What’s New in CellShield EE
IRI CellShield
Search Extract Mask Report
Going into V2 Shortly What’s Planned Later
Faster search/mask in volumeDarkShield-side masking alternative
Improved audit column support Sharepoint access
IRI Data Protector Suite
What DarkShield Does
IRI DarkShield
Search Extract Redact Audit
● Simultaneously scans, extracts, and de-IDs or deletes PII (and audits actions) in all supported file formats
● Finds faces or defined data classes tied to RegEx patterns, lookup sets, NER models,and/or image regions
● Builds, saves, and re-uses semi-supervised, machine learning models in project or cloud repositories
● Blacks-out PII in images, blurs faces, and applies of encryption (including FPE), pseudonymization, hashing,
encoding, bit scrambling, redaction, or erasure functions for PII in text files and documents
● Writes masked files atop originals, or to different folders with the same file names and formats
● Shows search, remediation, and model training job status via real-time progress bars
● Shares search methods and masking functions with CellShield EE and FieldShield
● Generates logs of all values found or masked, along with IRI-compatible metadata for BI, queries, etc.
● Creates graphical, interactive displays of search and mask results, or hand-offs log files to Splunk
● Runs in IRI Workbench with other IRI and Eclipse tools, or from the command line
Granular Sourcing/Targeting
IRI DarkShield
Search Extract Redact
Use DarkShield’s dark data discovery wizard to find sensitive data in unstructured files LAN-wide, mask it, and re-target the results.
IRI DarkShield
Apply width-preserving redaction, blackout, deletion, encryption, pseudonymization, and other data masking functions to protect PII and comply with data privacy laws like the GDPR.
Deletion Function
IRI DarkShield
Search Extract Redact Audit
IRI FieldShield, DarkShield & CellShield and other features in Voracity combine to comply with GDPR (and thus CCPA, KVKK, etc.) provisions like:
● Discovery and De-Identification of PII and PI
● The right to be Forgotten (via erasure like this)
● Data Portability (via extraction and reformatting)
● Data Rectification (via discovery and cleansing)
IRI DarkShield
Optionally and automatically extract all of the values you searched for (think GPDR data portability), plus the metadata associated with the files containing those values.
Image Formats
IRI DarkShield
Search Extract Redact Audit
.TIFF before
.TIFF after.
BMP, GIF, JPx, and PNG also!
IRI DarkShield
Search Extract Redact Audit
DarkShield supports both pre-trained OpenNLP Name Finder models or new Named Entity Recognition (NER) models that you can build and train inside its semi-supervised machine learning dialog. This iterative process improves the accuracy of searches for names and other nouns based on their Natural Language Processing (NLP) context in sentences.
Compare this method to other DarkShield search methods, like pattern and lookup matches, path filters, or bounding-box areas (for images).
NER & Machine Learning
Facial Detection & TrainedFacial Recognition Masking
IRI DarkShield
Search Extract Redact Audit
DarkShield can detect faces in any image and blur (all of) them, or just those it recognizes from your trained library of faces.
Search via Path Filters
IRI DarkShield
Search Extract Redact Audit
Allows the user to take the structure of a JSON file into account during searches. Additional filters for other formats like XML are being added now. This:
● Ignores fields that do not match the filter
● Increases search speed, and narrows the scope of the search results
Documented CLI
IRI DarkShield
Search Extract Redact Audit
IRI DarkShield
Easily query, analyze, and format the results of search and mask operation through built-in reports and this graphical display.
Or, export DarkShield log data for visualizations your preferred BI tool, or to SIEM environments like Splunk ES, shown here.
It is also then possible to take actions through the Splunk Adaptive Response Framework or a Phantom playbook.
IRI DarkShield
Current Benefits
1. More unstructured format support, including A/V, proprietary apps, cloud silos, etc.2. Additional ergonomic convergence with structured and embedding sources3. Plug-in integration with more SIEM tools beyond Splunk ES and Phantom
Playbooks which are now supported, like IBM QRadar and SolarWinds4. Additional logging and application integration options
1. Combines PII discovery, delivery, deletion, and reporting in multiple unstructured source formats into one or more ergonomic operations
2. Allows pattern definition reuse and combination to consolidate searches3. Consolidates multiple right to be forgotten and data portability requests into the
same find/fix operation through literal names or lookup-file matches4. Supports multiple drives, nodes, and threads for searching and masking work5. Operates in the same Eclipse job design and metadata environment, IRI
Workbench, with related data governance and management activities6. Features affordable licensing options (standalone, bundled, or free in Voracity)7. Integration with IRI FieldShield/Voracity data classification and masking functions8. Parameter serialization and modeling for easy modification and batch execution
Development Roadmap
IRI Data Protector Suite
What is DMaaS?
● IRI Data Masking as a Service (DMaaS) is a professional service engagement● DMaaS makes use of trusted IRI ‘shield’ software products described above ONLY● Certified IRI experts classify, discover, and de-identify PII of concern in supported silos● Also available: HIPAA re-ID risk scoring and anonymization, and ‘fake PII’ for testing ● IRI services are performed under a SoW with NDA, BAA, or other data security terms● Source data that cannot be sent is accessed via VPN or secure public/private cloud● Data is only accessed by IRI engineers in the US or certified partners like Capgemini● All data access, classification, discovery (search) and masking operations are logged● Billing is hourly or daily, with project rates available; IRI software costs are subsumed● Customer is responsible for payment of cloud infrastructure of their choice
IRI Data Masking as a Service (DMaaS)
Search Extract Mask Report
IRI DMaaS
User Profiles
IRI DMaaS
Use CasesRBS / Tesco (PCI DSS)
○ Produced and implemented custom encryption for testing data in M&A
Confidential (HIPAA)○ Cataloged and de-identified protected health information
University of Adelaide (Privacy Act)
○ Data classification, search, and de-identification of PII in massive PeopleSoft financial, HR, and campus test data schemas in Oracle
● DBAs and sysadmins responsible for PAN, PHI, PII or other sensitive information● Sites needing standard data classification and consistent masking functions ● CISOs without sufficient internal IT resources to do this work internally● Data governance and C-suite officers subject to compliance audits
Also available with IRI Data Protector or Manager Suites, and the IRI Voracity Platform
What RowGen Does
● Creates synthetic but realistic random and random-real test data simultaneously
● Improves DB prototypes, application quality, benchmarking, and outsourced operations
● Uses standard DB DDL, production file, and custom metadata to define layouts
● Preserves structural and referential integrity of real EDW DBs for testing
● Produces data in any types, structure, volumes, value ranges, and if condition
● Synthesizes composite data values and custom (master) data formats
● Generates computationally valid and invalid NID (Codice Fiscale, etc.) SSNs, CCNs
● Sets and graphs test data value distributions (linear, normal, random, etc.)
● Applies common attribute rules (like lookups) rules for pattern-matched field names
● Filters, transforms, and pre-sorts test data while it’s being generated
● Writes loader metadata and perform direct path loads for test DB populations
● Builds test flat-file and custom/structured detail and summary report targets
● Subsets and masks databases automatically for test purposes
● Provides SDK functions for generating test data in Java apps and Hadoop
IRI RowGen
Use Existing Data Models and Metadata
Build Test Data for:○ Altibase○ CLF/ELF○ COBOL○ CSV○ DB2○ Hadoop○ Hive○ JSON○ LDIF○ MySQL○ NoSQL DBs○ Oracle○ SQL Server○ Sybase○ Teradata○ XML
IRI RowGen
DB Subsetting, Masking Optional
IRI RowGen
Including subsetting and test data generation wizards facilitate DB and EDW prototyping, as well as test data virtualization for DevOps. Masked and referentially-correct copies of production table extracts ensure production data is safe and test data is realistic.
User ProfilesAnyone doing DB testing, app development, stress-testing, or benchmarking, including:
○ Developers (programmers)○ DBAs and DW (ETL) architects○ Analysts and consultants
Use CasesBank of Montreal
○ Generates safe, realistic 20GB Oracles tables with RI for query testing
MasterCard Peru
○ Synthesizes PAN and PII in files to support OLTP and app testing
Transitive UK
○ Simultaneously creates and transforms data to test cross-OS virtualization
IRI RowGen
Key Differentiators
1. Big data generation and population performance
(embedded CoSort pre-sorting engine speeds bulk loads)
2. Synthetic data that’s broader and safer than real data
3. Concurrent test data manipulation and custom report outputs
4. Simple, portable, and modifiable test data generation and auto-built
DB loader scripts, all managed visually in Eclipse
5. Metadata compatibility with IRI software, Erwin (AnalytiX DS), and MIMB:
to facilitate test data generation for 3rd-party BI, CRM, and ETL tools
IRI RowGen
What’s New in RowGen
IRI RowGen
Recently Added Development UnderwayAbility to generate Data Vault test data Random direct DB column lookups
New email generator On-demand TDM via TAF integration
New credit card number generator Provisioner for Splunk test data
New national ID number generator KNIME node test data integration
Learn and ShareIRI Data Masking Solutions
Data Masking How-to Articles
LinkedIn Data Masking Group LinkedIn Test Data GroupTest Data How-to Articles
IRI Test Data Solutions
Voracity Platform ResourcesIRI Mask/Test Tech Talk Videos