Addressing Data Privacy Challenges In Global Applications

ADDRESSING DATA PRIVACY CHALLENGES

IN GLOBAL APPLICATIONS A Technical Solution Approach

Sambit Banerjee https://www.linkedin.com/in/sambitkbanerjee

October 15, 2016

Addressing Data Privacy Challenges In Global Applications | Sambit Banerjee

Page 1 of 37

1. Overview

Maintaining Data Privacy with application footprints in multiple countries is often a challenge for global organizations. Whereas

in the current age of global economy, sharing of data through international boundaries is extremely important, there is also a

great deal of emphasis on Anti-Money Laundering (AML), which relies on data sharing, as well as on the Data Privacy (DP) aspect,

a blend of which applies to protecting personal data (often referred to as Personally Identifiable Information, i.e., PII data) of

the customers from identity theft, and other misuses. To achieve the goal of maintaining a balance between these two (i.e.,

AML and DP) competing requirements, while most countries have mandated reasonable regulatory measures in the areas of

AML and DP, some other countries have imposed further restrictions mandating that the PII data of their citizens / clients must

remain within the geographical borders of their countries (we call such countries as ‘DP countries’ in this article).

Most global organizations comply with such regulatory mandates from the DP countries by deploying end-to-end system

(application and database) instances within the DP countries so that the PII data of the clients of the DP countries physically

remain within the DP countries, and only non PII and aggregated client data is sent across the country-border to the countries

of their global corporate offices.

Besides the complexities and the additional cost of deploying such distributed instances of the same application in multiple

countries, there are a few common operational challenges involving the DP countries.

1) Typically, such globally deployed systems are designed and developed in the countries with global corporate offices where

Subject Matter Experts (SMEs) and key developments resources are located. These resources may need access to the

instances hosted in the DP countries for investigation and resolution related to the application issues. Common measures

adopted to overcome the remote access issues for the DP countries include –

a. Series of Q&A and hand-holding between the local support teams of the DP countries and the global SMEs located

outside the DP countries

b. At times, local support teams manually remove PII data from log and data files, if exists, before providing those

files to the global SMEs for investigation, in order to comply with the local DP rules

c. Some level of screen sharing takes place after the local support teams do their due diligence to ensure that the PII

data of the DP country is not accidentally exposed to the global SMEs


Page 2 of 37

These makes lives of the global SMEs difficult as they –

o depend on the availability (e.g., due to different time zones) and standard operating procedures of the local

support teams

o do not get a holistic view of the current state of the data and the system of the end-to-end application instance of

the DP country because they receive pieces of information one at a time based on the Q&A and hand-holding

o often resort to writing ad-hoc queries, scripts, etc. for the local support teams in order to obtain non-PII data from

the DP country to understand potential data issues.

This process of analysis, identification of root cause, and final resolution of the issues related to the application instances

in the DP countries often turns out to be time consuming, less cohesive and less efficient.

2) While some core support members (e.g., data center staff, system admin, dba, etc.) may be located within the DP country,

some, if not all, of the production support teams (e.g., L1/L2/L3) may be located outside the DP country (as a cost saving

measure), and they may need access to the application instance of the DP country for preliminary investigation of the

application issues before seeking the assistance of the global SMEs.

3) Functional or business team members from the corporate office may need to assist the end-users located inside the DP

country, investigate issues in the application as it relates to various business rules and associated data elements.

As a result, tangible and intangible costs add up due to operational constraints, additional burden on multiple local and global

teams so that there is no exposure of PII data outside the border of the DP country.

All these issues can be addressed seamlessly if there is a solution to dynamically and conditionally suppress PII and other

sensitive data elements based on well-defined criteria, while accessing data.

Fortunately there is one, which we’ll discuss in this article with the help of a standard business requirement, and an end-to-end

technical solution to meet business requirements and address operational issues.


Page 3 of 37

2. Business Requirement

XYZ Financial Corp. has –

Its corporate office in a country ‘CNTRY_Non_DP’ that does not restrict sharing of data with other countries.

Its presence in a DP country ‘CNTRY_DP’ that restricts sharing of data of its clients with other countries by mandating that

the PII data of its clients cannot cross the geographical border of the DP country. So, the XYZ Financial Corp. has deployed

an end-to-end application instance in a data center within CNTRY_DP, in order for its local staff to operate on the data of

the clients of the CNTRY_DP country stored in the in-country application / database instance.

The following tables contain master and account data of a few clients of the country CNTRY_DP to describe the DP rules.

Table 1. Client Master Data

Client Identification Primary Contact of the Client

ID Type Domicile Country

First Name

Last Name

Middle Name

Date of Birth

Address Phone Number email

1001 Individual CNTRY_DP John Doe J. 2-Jan-1960 123 Main St, City1, State1 1111-2222-3333 [email protected]

2002 Individual CNTRY_DP Ann Wu 12-Aug-1970 678 River Ln, City2, State2 9988-776-6XYZ js12@xyz .net

Table 2. Client Account Data

Client ID

Account Details

ID Number Type Full Name Date of Birth Address Phone Number email Relation with Primary

Domicile Country

1001 1 445566777 CreditCard John Doe 2-Jan-1960 123 Main St, City1, State1 1111-2222-3333 [email protected] Self CNTRY_DP

1001 2 445561727 CreditCard Jane Doe 5-Mar-1962 123 Main St, City1, State1 1111-2222-3344 [email protected] Spouse CNTRY_DP

2002 3 33245 Savings Ann Wu 12-Aug-1970 678 River Ln, City2, State2 998-8776-6XYZ [email protected] Self CNTRY_DP

2002 4 22567 Checking Ann Wu 12-Aug-1970 678 River Ln, City2, State2 998-8776-6XYZ [email protected] Self CNTRY_DP

2002 5 S1234 Student Loan

David Wu 21-Sep-1997 888 College Blvd, X-City, Y-State

998-8745-6XYZ [email protected] Son CNTRY_ Non_DP

2002 6 6677889901 CreditCard Amy Wu 18-May-2000 678 River Ln, City2, State2 998-8123-4XYZ [email protected] Daughter CNTRY_DP


Page 4 of 37

The rules governing these data elements state that –

1) All accounts of the client ID 1001 are created in CNTRY_DP, and hence the associated PII data cannot be accessed from

outside the CNTRY_DP.

2) The same rule applies to the client ID 2002, with the exception that the account of a family member ‘David Wu’ is created

in CNTRY_Non_DP as he has taken a student loan in CNTRY_Non_DP where he is attending college. The XYZ Financial

Corp. has used other accounts of the client ID 2002 as a security for granting the student loan.

a. Although the student loan account is tied to the client ID 2002, the account holder ‘David Wu’ has opened the

student loan account in the CNTRY_Non_DP and also resides in CNTRY_Non_DP, which makes him a client of the

CNTRY_Non_DP country.

b. Due to AML requirements of CNTRY_DP, the same student loan account record tied to the client ID 2002 should

also be present in the in-country application instance of CNTRY_DP, especially due to the fact that CNTRY_Non_DP

does not restrict sharing of its client information with other countries.

c. That means while the users from the CNTRY_Non_DP country cannot access PII data of other accounts of the client

ID 2002 from the CNTRY_DP in-country instance, they should still be able to access the account information related

to the student loan account along with the non PII data of other accounts of the client ID 2002.

So, the user interface of the CNTRY_DP application instance should reflect the following views:-


Page 5 of 37

Figure 1. View of the CNTRY_DP users accessing data of the client ID 2002 from within the CNTRY_DP border


Page 6 of 37

Figure 2. View of the CNTRY_Non_DP users accessing data of the client ID 2002 from outside the CNTRY_DP border. (The highlighted fields indicate redacted PII data, per the business requirement)


Page 7 of 37

3. Solution Design

Let's take a look at the system components of a standard 3-tier application instance hosted in a DP country, and associated data

access points.

Figure 3. High Level Architecture of an Application Instance hosted in a DP Country


Page 8 of 37

Figure 3 should be self-explanatory to those involved in system implementation in some capacity. A few notations used in this

diagram are –

i) The boxes with the text ‘John Doe’ represent plain text PII data as it moves thru different system components.

ii) RW FID – Database Functional user ID with Read-Write access to the database

iii) RO FID - Database Functional user ID with Read-Only access to the database

3.1. Design Considerations

Multiple access points for consumption of the same data

Masking the same sensitive data in different access points could lead to repetitive efforts and potential data

inconsistency

So, sensitive data should be masked (i.e., redacted) at the source while retrieving the same data in order to achieve

optimal efficiency and effectiveness

3.2. Design Principles for Data Redaction

Sensitive data should be redacted without changing the actual data stored in the production database.

Only presentation of the target data for redaction should be affected

No change in existing data model or schema FIDs should be required

Implement configurable and rule based design for data redaction as much as possible. Avoid writing custom code at

the application layers for data redaction unless absolutely needed. This addresses portability and manageability

aspects of the application design.


Page 9 of 37

Separation of controls – implement the rule based policies owned by a different DB FID other than the DB FID owning

the data.

Data redaction configuration/rules should be enabled, disabled, modified at any time without affecting the BAU

operations on the core data. This one is probably less important in most operational scenario, but nice to have.

It would be great to be technology agnostic to meet these design principles. However, not all database vendors have seamlessly

addressed the challenge of data redaction effectively as yet. Oracle is among the database vendors to have addressed this

challenge at the core of its database engine, meeting these design principles.

So, despite the prospect of potential criticism for not being technology agnostics and vendor lock-in, I am using Oracle database

and relevant features for the technical solution as described in this article.

3.3. Data Redaction Solution Components using Oracle

Key Characteristics

Database session context based, i.e., dynamically applicable to each user session

Column level Data Redaction - using ‘Oracle Data Redaction’ (i.e., DBMS_REDACT package) policies for column(s)

of a table

Row level security for sensitive data - using ‘Oracle Virtual Private Database (VPD)’ (i.e., DBMS_RLS package)

policies for conditionally filtering out rows of a table

Redacted data elements at the table level are automatically reflected / propagated at the view (built upon those

tables) level

Effective immediately upon creation / modification of policies


Page 10 of 37

Good News - Available in Oracle Database 11.2.0.4 and upwards, and comes with core Oracle Database

installation package.

Security Considerations

Users with DBA privileges, and certain database administrative operations (e.g., backup / recovery / replication,

etc.) are exempt from Data Redaction using these DBMS packages.

It is possible for Oracle database users with privileged access to break Data Redaction by inference. Oracle

documentation and some other publicly available articles have explained it well. However, it is not among the

best security practices to provide users and developers with privileged access to production data.

Some more disclaimers on Oracle Redaction limitations can be found in Oracle documentation, but I think those

considerations should be addressed in most well-designed database applications anyway.

Now that we know what tool and database we are going to use for our solution, let’s discuss how we’ll use it. As seen in

the architecture diagram in Figure 3, there are two modes of accessing data – 2-tier, and 3-tier (or, N-tier for that matter).

3.4. 2-Tier Access Mode

This is for accessing data over 2-tier client-server protocols, where a user logs in the Oracle database directly using a

database FID, and tools such as SQL*Plus, SQL Developer. In most cases, these tools use SQL*Net protocol, and some use

ODBC / JDBC protocols.

Solution Design Principles for 2-tier access mode

FID owning the data should not be used, as the best practice, to access production data directly over 2-tier client-

server protocols. Use FIDs that are not owner of the data, such as a RO FID.


Page 11 of 37

As the RO FID is also used by the global and local support resources, PII data of the DP Country should be redacted

when the RO FID is used to access the data.

Data redaction policies are configured and dynamically triggered at the run time based on which FID is accessing the

data

Key implementation aspect:- Evaluation condition for Oracle redaction policy for the database engine –

SYS_CONTEXT('USERENV', 'SESSION_USER') != ‘<FID owning data>'

This is the simplest form of implementation of data redaction

3.5. 3-Tier (or, N-Tier) Access Mode

In 3-tier / n-tier systems, a user logs in an application with fine-grained entitlement-based individual user ID from a

front-end UI, e.g., a browser. Upon receipt of the user request from the browser, the application components in the

middle-tiers –

connect to the database directly using a generic database FID over 2-tier client-server protocols

access the data from the database

send the data to the users via application processing and presentation layers (implemented in [n-1] tiers) of the

n-tier architecture.

Solution for this access mode needs more considerations than the simplest form of implementation of data redaction,

including the data flow mechanism of n-tier architecture as explained below.


Page 12 of 37

Figure 4.

Fine-grained management of database sessions is done by separating the notion of database sessions (user

handles) from connections (server handles).

Use of sessions (typically stateless) with database connection pools is implemented in the multi-threaded mid-

tiers of n-tier architecture.

Each thread can maintain a session to the database. The actual connections to the database are maintained by

the connection pool, and these connections (including the pool of dedicated database server processes) are

shared among all the threads in the middle tier(s).

Stateless sessions are serially reusable across mid-tier threads. That means, after a thread is done processing a

database request on behalf of a 3-tier user, the same database session can be reused for a completely different

request on behalf of a completely different 3-tier user.


Page 13 of 37

These aspects ensure performance, security, scalability and endurance of the mid-tiers of n-tier architecture.

Solution Design Principles for 3-tier / N-tier access mode

Either a RW FID, same as the data owner, or a RO FID, is used by the application code to access production data

directly over 2-tier client-server protocols. The database connections established by these FIDs are typically

shared (i.e., connection pooling) among the user sessions.

As a result, all user sessions established at the Presentation layer of the application are bound to the database

sessions established by the same database FID.

However, based on certain criteria, the same data must be redacted for some users and should remain un-

redacted for other users. Applications can implement various strategies (including location awareness using

browser IP address) to determine such criteria for the users.

However, as the Data Redaction should take place before the data comes out of the database (i.e., at the

source), the database engine needs to be able to differentiate between the user sessions requiring data

redaction and the user sessions without data redaction. The mechanism of this differentiator is ‘session-

tagging’, implemented using the CLIENT_IDENTIFIER property of the associated database session.

Both the application and the database engine need to do their part to make ‘session-tagging’, work properly.

Key implementation aspect

1) Application tags a new session for data redaction upon user login to the application by setting the

CLIENT_IDENTIFIER property of the session with the ‘<Redaction identifier string>’.

Determination of a user session could include, but not limited to, the logic for location awareness in

conjunction with user’s profile info

Pseudo-logic for the code snippet in the post login validation code of the application for tagging a new

session for data redaction (using Java code - tested with ojdbc5)


Page 14 of 37

// User Login & Entitlement Check successful

// Get the handle of the current DB session, typically created previously with a JDBC call such as

// DriverManager.getConnection("jdbc:oracle:thin:.. .:..")

conn = (Connection ) get_CurrentDBSession();

String[] metrics = new String[OracleConnection.END_TO_END_STATE_INDEX_MAX];

// the 3rd condition in the following if statement is a return value of location awareness logic

If ( (this_app_instance is a_DP_Country_Instance) AND (the_user_country_code is NOT a_DP_Country) AND

(the_user_is_logged_in_from_outside_the_country_border_of_this_app_instance) ) {

// Set CLIENT_IDENTIFIER with <Redaction identifier string>

metrics[OracleConnection.END_TO_END_CLIENTID_INDEX] = "REDACT”;

} else {

// Reset CLIENT_IDENTIFIER when using an existing session handle e.g., from the DB connection pool

metrics[OracleConnection.END_TO_END_CLIENTID_INDEX] = ‘’;

}

// Update the DB session to set / unset data redaction

((OracleConnection) conn).setEndToEndMetrics(metrics, (short)0);

2) Database engine evaluates the condition of the data redaction policy for the corresponding session as –

(SYS_CONTEXT(‘USERENV’, ‘SESSION_USER’) != ‘<FID owning data>’ OR

SYS_CONTEXT(‘USERENV’, ‘CLIENT_IDENTIFIER’) = ‘<Redaction identifier string>’)

With this design approach involving Oracle Data Redaction tools, the high level architecture of a standard 3-tier system

components (hosted in a DP country) in Figure 3 above should look like as in Figure 5 below. The boxes with the text

‘J****D**’ represent redacted PII data as it moves thru different system components.


Page 15 of 37

Figure 5.

Now that we have a high level design in place, we need to develop a prototype to demonstrate that this design actually will

work. I am a big fan of ‘seeing is believing’ – well, with reasonable expectations. So, let’s get to build the prototype.


Page 16 of 37

4. Development & Testing of a Prototype Application

For this prototype, we’ll do configuration and coding for both 2-tier access and 3-tier access. For 3-tier, we need to build

something to function as middle-tiers (web & application). As we are focusing on the data redaction aspect in this article,

we’ll bypass standard authentication, authorization, secure protocol, and all other heavy footprints that are otherwise

included in a standard enterprise application. For faster time to market (or, maybe me being simply lazy and trying to

avoid more work to develop a full scale Java-based application tier, configuration and deployment in a web & app server,

etc.), I have chosen to develop a light weight Python program, to simulate the middle-tier of a 3-tier system.

Let’s get started!

4.1. Database Setup

Here we setup Oracle database FIDs with necessary privileges, create tables with data per Tables 1 & 2 in the ‘Business

Requirement’ section above, and, setup Oracle Redaction & VPD policies. This is the heart of the solution.

1. Setup Oracle database user IDs with necessary privileges

At this point, I assume you have access to an Oracle database instance with DBA privilege. Per the design principles above, we’ll create 3 Oracle FIDs.

APPMAIN – to own the data, connect from the app as the RW-FID for data access. Per the design, data accessed by the APPMAIN is conditionally redacted.

APPRO – to act as the RO-FID. Per the design, data accessed by the APPRO is always redacted.

APPSEC – to act as the security administrator FID that will own the Data Redaction and VPD policies. This is for better security and control, it meets the principle of separation of job duties, and keeps your Info Security Officers and auditors happy.

Set Oracle environment variables on your server and login to the Oracle database with DBA privilege. $ sqlplus / as sysdba

-- Run the following statements at the ‘SQL>’ command prompt


Page 17 of 37

-- (Optional) Create a small tablespace for your prototype data. Otherwise, you can use an existing

-- tablespace and replace ‘myappdata’ with the name of that tablespace in the following SQL statements

SQL> create tablespace myappdata datafile '<path_of_datafile>/myappdata01.dbf' size 10M;

SQL> create user appmain identified by appmain default tablespace myappdata quota unlimited on myappdata;

SQL> grant create session, create table, create view to appmain;

SQL> create user appro identified by appro default tablespace myappdata quota unlimited on myappdata;

SQL> grant create session, create synonym to appro;

SQL> create user appsec identified by appsec default tablespace myappdata quota unlimited on myappdata;

SQL> grant create session, create procedure to appsec;

SQL> grant execute on DBMS_REDACT to appsec;

SQL> grant execute on DBMS_RLS to appsec;

--

-- The following grant is required to view session-tagging info of all connected database sessions

SQL> grant select on v_$session to appsec;

2. Create database tables, and populate them with data per Tables 1 & 2 above

a) Connect as appmain and run the DDL and DML statements SQL> connect appmain/appmain

-- Run the following statements at the ‘SQL>’ command prompt

--------------------------------------------------------

-- DDL for Table APPMAIN.CLIENT_MASTER

--------------------------------------------------------

SQL> CREATE TABLE "APPMAIN"."CLIENT_MASTER" (

"CLIENT_ID" VARCHAR2(20) NOT NULL ENABLE,

"PRIM_FIRST_NAME" VARCHAR2(20),

"PRIM_LAST_NAME" VARCHAR2(20),

"PRIM_MIDDLE_NAME" VARCHAR2(20),

"PRIM_DOB" DATE,

"CLIENT_DOMICILE_COUNTRY" VARCHAR2(20),

"PRIM_ADDRESS" VARCHAR2(50),

"PRIM_PHONE_NO" VARCHAR2(20),

"PRIM_EMAIL" VARCHAR2(30),

"CLIENT_TYPE" VARCHAR2(20),

CONSTRAINT "CLIENT_MASTER_PK" PRIMARY KEY ("CLIENT_ID") USING INDEX ENABLE );

--------------------------------------------------------

-- DMLs for Table APPMAIN.CLIENT_MASTER. These inserts the same data from Table 1.

--------------------------------------------------------


Page 18 of 37

SQL> Insert into APPMAIN.CLIENT_MASTER (CLIENT_ID,PRIM_FIRST_NAME,PRIM_LAST_NAME,PRIM_MIDDLE_NAME,PRIM_DOB,

CLIENT_DOMICILE_COUNTRY,PRIM_ADDRESS,PRIM_PHONE_NO,PRIM_EMAIL,CLIENT_TYPE) values ('1001','John','Doe',

'J.', to_date('02-JAN-60','DD-MON-RR'),'CNTRY_DP','123 Main St, City1, State1', '111122223333',

'[email protected]','Individual');

SQL> Insert into APPMAIN.CLIENT_MASTER (CLIENT_ID,PRIM_FIRST_NAME,PRIM_LAST_NAME,PRIM_MIDDLE_NAME,PRIM_DOB,

CLIENT_DOMICILE_COUNTRY,PRIM_ADDRESS,PRIM_PHONE_NO,PRIM_EMAIL,CLIENT_TYPE) values ('2002','Ann','Wu',null,

to_date('12-AUG-70','DD-MON-RR'),'CNTRY_DP','678 River Ln, City2, State2','99887766XYZ','[email protected]',

'Individual');

SQL> COMMIT;

--------------------------------------------------------

-- DDL for Table APPMAIN.CLIENT_ACCT

--------------------------------------------------------

SQL> CREATE TABLE "APPMAIN"."CLIENT_ACCT" (

"CLIENT_ID" VARCHAR2(20) NOT NULL ENABLE,

"ACCT_ID" VARCHAR2(20) NOT NULL ENABLE,

"ACCT_NUMBER" VARCHAR2(20) NOT NULL ENABLE,

"ACCT_TYPE" VARCHAR2(20) NOT NULL ENABLE,

"ACCT_FULL_NAME" VARCHAR2(20),

"ACCT_DOB" DATE,

"ACCT_ADDRESS" VARCHAR2(50),

"ACCT_PHONE_NO" VARCHAR2(20),

"ACCT_EMAIL" VARCHAR2(30),

"RELN_WITH_PRIM" VARCHAR2(20),

"ACCT_DOMICILE_COUNTRY" VARCHAR2(20),

CONSTRAINT "CLIENT_ACCT_PK" PRIMARY KEY ("CLIENT_ID", "ACCT_ID") USING INDEX ENABLE );

--------------------------------------------------------

-- DMLs for Table APPMAIN.CLIENT_ACCT. These inserts the same data from Table 2.

--------------------------------------------------------

SQL> Insert into APPMAIN.CLIENT_ACCT (CLIENT_ID,ACCT_ID,ACCT_NUMBER,ACCT_TYPE,ACCT_FULL_NAME,ACCT_DOB,

ACCT_ADDRESS,ACCT_PHONE_NO, ACCT_EMAIL,RELN_WITH_PRIM,ACCT_DOMICILE_COUNTRY) values ('1001','1',

'445566777','CreditCard','John Doe',to_date('02-JAN-60','DD-MON-RR'),'123 Main St, City1, State1',

'111122223333', '[email protected]','Self', 'CNTRY_DP');


ACCT_ADDRESS,ACCT_PHONE_NO,ACCT_EMAIL,RELN_WITH_PRIM,ACCT_DOMICILE_COUNTRY) values ('1001','2',

'445561727','CreditCard','Jane Doe',to_date('05-MAR-62','DD-MON-RR'),'123 Main St, City1, State1',

'111122223344','[email protected]','Spouse','CNTRY_DP');



'33245','Savings','Ann Wu',to_date('12-AUG-70','DD-MON-RR'),'678 River Ln, City2, State2',

'99887766XYZ','[email protected]','Self','CNTRY_DP');




Page 19 of 37

'22567','Checking','Ann Wu',to_date('12-AUG-70','DD-MON-RR'),'678 River Ln, City2, State2',

'99887766XYZ','[email protected]','Self','CNTRY_DP');



'S1234','Student Loan','David Wu',to_date('21-SEP-97','DD-MON-RR'),'888 College Blvd, X-City, Y-State',

'99887456XYZ','[email protected]','Son','CNTRY_Non_DP');



'6677889901','CreditCard','Amy Wu',to_date('18-MAY-00','DD-MON-RR'),'678 River Ln, City2, State2',

'99881234XYZ','[email protected]','Daughter','CNTRY_DP');

SQL> COMMIT;

--------------------------------------------------------

-- DDL for Table APPMAIN.USER_TBL

-- This table is to simulate mapping between users and their locations.

--------------------------------------------------------

SQL> CREATE TABLE "APPMAIN"."USER_TBL" (

"USER_ID" VARCHAR2(10) NOT NULL ENABLE,

"USER_LOCATION" VARCHAR2(20) NOT NULL ENABLE,

CONSTRAINT "USER_TBL_PK" PRIMARY KEY ("USER_ID") USING INDEX ENABLE );

--------------------------------------------------------

-- DMLs for Table APPMAIN.USER_TBL. 2 users for CNTRY_DP and 2 users for CNTRY_Non_DP.

--------------------------------------------------------

SQL> Insert into APPMAIN.USER_TBL (USER_ID,USER_LOCATION) values ('user1','CNTRY_Non_DP');

SQL> Insert into APPMAIN.USER_TBL (USER_ID,USER_LOCATION) values ('user2','CNTRY_DP');

SQL> Insert into APPMAIN.USER_TBL (USER_ID,USER_LOCATION) values ('user3','CNTRY_DP');

SQL> Insert into APPMAIN.USER_TBL (USER_ID,USER_LOCATION) values ('user4','CNTRY_Non_DP');

SQL> COMMIT;

b) Grant RO access and create synonyms -- Run the following commands at the ‘SQL>’ command prompt

SQL> connect appmain/appmain

SQL> grant select on appmain.client_master to appro;

SQL> grant select on appmain.client_acct to appro;

SQL> grant select on appmain.user_tbl to appro;

SQL>

SQL> connect appro/appro

SQL> create synonym client_master for appmain.client_master;

SQL> create synonym client_acct for appmain.client_acct;

SQL> create synonym user_tbl for appmain.user_tbl;


Page 20 of 37

3. Setup Oracle Redaction & VPD policies

After creation of the Data Redaction policy, it is automatically enabled and ready to redact data with immediate effect. Connect as appsec and run the following statements at the ‘SQL>’ command prompt to setup Oracle Redaction and VPD policies - SQL> connect appsec/appsec

------------------------------------------

-- For Column Level Security - using Oracle Redact

-- This is to redact client master data - at the column level (i.e., top part) in the UI

------------------------------------------

SQL> BEGIN

DBMS_REDACT.ADD_POLICY (

object_schema => 'APPMAIN',

object_name => 'CLIENT_MASTER',

policy_name => 'CLIENT_REDACT',

expression => 'SYS_CONTEXT(''USERENV'', ''SESSION_USER'') != ''APPMAIN'' OR

SYS_CONTEXT(''USERENV'', ''CLIENT_IDENTIFIER'') = ''REDACT'' ',

column_name => 'PRIM_DOB',

function_type => DBMS_REDACT.PARTIAL,

function_parameters => 'm12d31y2016'); -- setting dates to a specific value instead of masking chars

DBMS_REDACT.ALTER_POLICY (




action => DBMS_REDACT.ADD_COLUMN,

column_name => 'PRIM_FIRST_NAME',

function_type => DBMS_REDACT.REGEXP,

function_parameters => NULL,

regexp_pattern => '(.)',

regexp_replace_string => '*' );






column_name => 'PRIM_LAST_NAME',




regexp_replace_string => '*' );


Page 21 of 37






column_name => 'PRIM_MIDDLE_NAME',




regexp_replace_string => '*');






column_name => 'PRIM_PHONE_NO',




regexp_replace_string => '9');






column_name => 'PRIM_EMAIL',



regexp_pattern => DBMS_REDACT.RE_PATTERN_EMAIL_ADDRESS,

regexp_replace_string => DBMS_REDACT.RE_REDACT_EMAIL_ENTIRE);






column_name => 'PRIM_ADDRESS',




regexp_replace_string => '*');

END;

/


Page 22 of 37

------------------------------------------

-- For Row Level Security - using Oracle VPD

-- This is to redact the client account data – at the row level in the UI

------------------------------------------

SQL> CREATE OR REPLACE FUNCTION POLICY_FUNC_CLIENT_ACCT

(schema IN VARCHAR2, tab IN VARCHAR2) RETURN VARCHAR2 AS

predicate VARCHAR2(4000) DEFAULT NULL;

BEGIN

IF (SYS_CONTEXT('USERENV', 'SESSION_USER') != 'APPMAIN' OR

SYS_CONTEXT('USERENV', 'CLIENT_IDENTIFIER') = 'REDACT') THEN

predicate := ' ACCT_DOMICILE_COUNTRY != ''CNTRY_DP'' ';

ELSE

NULL;

END IF;

RETURN predicate;

END;

/

SQL> BEGIN

DBMS_RLS.ADD_POLICY (


object_name => 'CLIENT_ACCT',

policy_name => 'VPD_CLIENT_ACCT',

function_schema => 'APPSEC',

policy_function => 'POLICY_FUNC_CLIENT_ACCT',

statement_types => 'SELECT',

policy_type => DBMS_RLS.CONTEXT_SENSITIVE,

sec_relevant_cols_opt => DBMS_RLS.ALL_ROWS,

sec_relevant_cols => 'ACCT_NUMBER,ACCT_FULL_NAME,ACCT_DOB,ACCT_ADDRESS,ACCT_PHONE_NO,ACCT_EMAIL');

END;

/


Page 23 of 37

4.2. Verification of Database Setup

Assuming the above mentioned steps for setting up database for our prototype application are successful, let’s check out if

things are working properly per the design. We can use any SQL tool for this verification step. I have used the SQL Developer

tool so that all output columns of the tables can be accommodated, even if partially, in the screen.

1) Login as appmain, query client master and client account tables. Per design, all data items should be visible as the appmain

FID owns this data.

Figure 6. Querying appmain.client_master as the appmain FID

Figure 7. Querying appmain.client_acct as the appmain FID


Page 24 of 37

2) Login as appro and query client master and client account tables. Per design, PII data items of CNTRY_DP should be

redacted when accessed by the RO FID, i.e., appro, except the Student Loan account record, as it is owned by

CNTRY_Non_DP.

Figure 8. Querying appmain.client_master as the appro FID

Figure 9. Querying appmain.client_account as the appro FID

It is to be noted that Oracle VPD has not implemented use of selective (i.e., partial/full/regex) masking of data. The default

masked content is ‘NULL’, therefore ‘(null)’ in the PII data columns, whose actual stored values can be seen in Figure 7.

This concludes the proof of concept of 2-tier data access as per our design criteria. Next, we’ll build the middle tiers (web and

application) of our prototype application.


Page 25 of 37

4.3. Middle-Tier (Web and Application) Setup

For our Python based middle-tiers (web and application), the approach is to use Python’s CGI scripting framework along

with Python HTTPServer modules. Figure 10 shows the contents of the entire middle-tier setup under a folder on a Unix

server.

Figure 10. Contents of the middle-tier components of the prototype

Folder and Files Description

cgi-bin This folder contains the Python scripts based on CGI framework. The scripts in this folder simulates the mechanisms of the web and application tiers as shown in Figure 4 above.

cgi-bin/app3tier.py This script is the main application code that accepts input from a browser based UI, retrieves data from the database, and sends formatted response back to the browser UI for rendering.

cgi-bin/multi.py This script establishes a listening endpoint (default port = 55123) on the server, accepts simultaneous connections from the browsers, and handles those connections by multithreading them.

app3tier.html This is a simple html file (created with Notepad – keeping in line with my moto of faster-time-to-market) that works as the point of entry into the prototype application and to render the UI in a browser. You would invoke this from your browser as – “http://<your_server_name>:55123/app3tier.html”

my.env This is to set the environment variables like ORACLE_HOME, ORACLE_SID, PATH, LD_LIBRARY_PATH, etc. so that the python scripts find the path to Python modules and Oracle drivers. It’s a good practice to keep all these items together in a single file.

startSvr.sh This script starts the HTTP server and the server side runtime environment of the prototype application.


Page 26 of 37

Code:- app3tier.html <html>

<title>Data Redaction ProtoType</title>

<body>

<form action="/cgi-bin/app3tier.py" method="post" target="out_iframe">

<table>

<tr style="background-color:cyan;width=100%;text-align=center"><td colspan="3">Simulation of a 3-tier

application hosted in a Data Privacy Country (CNTRY_DP)</td></tr>

<tr>

<td>Login with User Id: <input type="text" name="UserId"></td>

<td>(Hint: Enter one of - user1, user2, user3, user4 for this exercise)</td>

<td rowspan="2" style="padding-left:100px"> <input type="submit" value="View Client Info" /> </td>

</tr>

<tr>

<td>Lookup Client ID : <input type="text" name="CID"></td>

<td>(Hint: Enter one of - 1001, 2002 for this exercise)</td>

</tr>

</table>

</form>

<iframe name="out_iframe" src="/cgi-bin/app3tier.py" width="100%" height="80%"></iframe>

</body>

</html>

Code:- my.env export ORACLE_HOME=<Your_Oracle_Home_Path_Here. I tested with Oracle DB 11.2.0.4>

export ORACLE_SID=<Your_Oracle_SID_Here>

export LD_LIBRARY_PATH=$ORACLE_HOME/lib

export PATH=$ORACLE_HOME/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:.

# Add other paths as needed in your environment. Python executable is in /usr/bin in my RHEL server environment

Code:- startSvr.sh . ./my.env

python cgi-bin/multi.py


Page 27 of 37

Code:- cgi-bin/multi.py #!/bin/env python

import sys, SocketServer, BaseHTTPServer, CGIHTTPServer

class MyLittleThreadingServer(SocketServer.ThreadingMixIn, BaseHTTPServer.HTTPServer):

pass

# default listening port is 55123, if no port number is passed as command line argument

if sys.argv[1:]:

port = int(sys.argv[1])

else:

port = 55123

print "Oh no, not again :-(..."

myserver = MyLittleThreadingServer(('', port), CGIHTTPServer.CGIHTTPRequestHandler)

try:

while 1:

sys.stdout.flush()

myserver.handle_request()

except KeyboardInterrupt:

# Ctrl+C to terminate

print "\nFinally Done! Thank God!"

Code:- cgi-bin/app3tier.py #!/usr/bin/python

import os, sys, cgi, cgitb, cx_Oracle

# class definition

class app3tier(object):

# main function

def main(self, argv):

form = cgi.FieldStorage()

# Get user input from the two text input fields of app3tier.html

self.UserId = form.getvalue('UserId')

self.CID = form.getvalue('CID')

self.IsRedact = None

self.UserLocation = None

self.cur = None


Page 28 of 37

self.con = None

self.IsOk = True

print "Content-type:text/html\r\n\r\n"

print "<html><head><title>Data Redaction Prototype</title>"

print "<style> th { background-color: lavender; text-align: left } </style></head><body>"

if (self.UserId is not None):

try:

# Connect to the Oracle database using the RW-FID appmain. In real enterprise application,

# this will likely be grabing an existing connection from the DB connection pool.

self.con = cx_Oracle.connect("appmain", "appmain", "<Name_of_Oracle_DB_Instance>")

self.cur = self.con.cursor()

# Upon successful connection to the database, determine if this session should be redacted

self.getUserLocation()

# If no error occured while determining the location of the user, and evaluating data

# redaction criteria, query the master record

if self.IsOk:

self.getMasterRecord()

# If no error occured obtaining the master record, query the account records

if self.IsOk:

self.getAcctRecords()

except cx_Oracle.DatabaseError, exc:

error, = exc

print "Error:- ", error.message

print "</body></html>"

# Flush output buffer so that the browser receives all contents sent by the print statements

sys.stdout.flush()

# Uncomment the following line (i.e. raw_input()) if interested in viewing tagged DB sessions using a SQL tool.

# The idea is to put a hold on the connections established from this script from being closed in the following

# lines of this main function. However, as there is no mechanism from these threads to accept keyboard input,

# there will be errors thrown at the server end, which is ok I guess, because uncommenting this line is just

# for testing purpose. Figure 14 shows the result of my testing by uncommenting the line with raw_input().

# raw_input("Paused here")

if self.cur != None:

self.cur.close()


Page 29 of 37

if self.con != None:

self.con.close()

# This function determines the location of the user, evaluates if data redaction should apply to this

# session, and sends the evaluation result along with the user input back to the browser.

def getUserLocation(self):

print "<table border=1><tr>"

print "<td><b>User Input:- Login Id= %s, Client ID= %s</b></td>" % (self.UserId, self.CID)

try:

self.cur.execute("select user_location from user_tbl where user_id = '" + self.UserId + "'")

row1 = self.cur.fetchone ()

if row1 == None:

self.UserLocation = 'Invalid UserId! Try again'

self.IsRedact = False

self.IsOk = False

else:

self.UserLocation = row1[0]

# Per design, as all non CNTRY_DP users should receive redacted data over 3-tier access,

# their corresponding database sessions should be tagged accordingly so that the Oracle

# database engine can apply the Redaction and VPD policies appropriately.

if (self.UserLocation != "CNTRY_DP"):

self.IsRedact = True

self.con.client_identifier = 'REDACT'

else:

self.IsRedact = False

self.con.client_identifier = ''


error, = exc


self.IsOk = False

print "<td><b>Login Validation:- User Location= %s, Data Redaction= %s </b></td>"

% (self.UserLocation, ("ON" if self.IsRedact else "Off"))

print "</tr></table><p />"

return;

# This function simply queries the database for the master record, and sends the data, as returned by

# the database, to the browser. The to_char function is used to trim the time part of the DOB field

# from appearing in the UI; just a little cosmetic thing. Other than that, there is no need to format

# the redacted data here as Oracle DB engine already masks the data as defined in the Data Redaction

# policies (i.e., using DBMS_REDACT functions) for column level data redaction.


Page 30 of 37

def getMasterRecord(self):

try:

stmt1 = ("select client_id, client_domicile_country, prim_first_name, prim_middle_name, "

"prim_last_name, client_type, to_char(prim_dob, 'YYYY-MM-DD') , prim_phone_no, "

"prim_address, prim_email from samapp_id.client_master where client_id = '" + self.CID + "'")

self.cur.execute(stmt1)

print "<table border=1>"

print "<tr><th colspan=""4"">Client Master Record ===== [ ", \

("Column Level" if self.IsRedact else "NO"), " Data Redaction ]</th></tr>"

for row in self.cur:

print "<tr><th>Client ID</th><td>", row[0],"</td><th>Domicile Country</th><td>", \

row[1], "</td></tr>"

print "<tr><th>First Name (Primary)</th><td", \

(' style="background: yellow">' if ( row[2] is not None and self.IsRedact ) else '>'), \

row[2],"</td><th>Middle Name (Primary)</th><td", \


row[3],"</td></tr>"

print "<tr><th>Last Name (Primary)</th><td", \


row[4],"</td><th>Client Type</th><td>", row[5],"</td></tr>"

print "<tr><th>DOB (Primary)</th><td", \


row[6],"</td><th>Phone (Primary)</th><td", \


row[7],"</td></tr>"

print "<tr><th>Address (Primary)</th><td", \


row[8],"</td><th>Email (Primary)</th><td", \


row[9],"</td></tr>"


error, = exc


self.IsOk = False

print "</table><p />"

return;


Page 31 of 37

# This function queries the database for account records along with some formatting on the data at the

# database level, and sends the contents from the database to the browser. This additional formatting is

# needed here because we are dealing with row level data redaction for the account records, and Oracle VPD

# (i.e., using DBMS_RLS functions) is used for row level data redaction. As the DBMS_RLS package is not as

# mature as the DBMS_REDACT package in terms of data masking, it returns the default value of '(null)' for

# masked data. Hence, the additional treatment of the null values in the SQL while retrieving account data.

def getAcctRecords(self):

try:

stmt1 = ("select acct_id, acct_type, nvl(acct_number, '*****'), "

"nvl(acct_full_name, '***********'), decode(acct_dob, null, '****-**-**', "

"to_char(acct_dob, 'YYYY-MM-DD')), nvl(acct_address, '*********'), "

"nvl(acct_phone_no,'*********'), nvl(acct_email, '****@***.***'), reln_with_prim, "

"acct_domicile_country from samapp_id.client_acct where client_id = '" + self.CID + "' ")

self.cur.execute(stmt1)

print "<table border=1>"

print "<tr><th colspan=""10"">Account Information for this client ===== [ ", \

("Row Level" if self.IsRedact else "NO"), " Data Redaction ]</th></tr>"

print "<tr><th width=""5%"">Account ID</th><th>Account Type</th>" \

"<th width=""7%"">Account Number</th>"

print "<th>Full Name</th><th width=""9%"">DOB</th><th>Address</th><th>Phone #</th>"

print "<th>Email</th><th width=""5%"">Relationship with Primary</th>" \

"<th width=""7%"">Account Domicile Country</tr>"

for row in self.cur:

print "<tr>"

for j in range (0,10):

print "<td", (' style="background: yellow">' \

if ( self.IsRedact and row[j].startswith('***') ) else '>'), row[j], "</td>"

print "</tr>"


error, = exc


self.IsOk = False

print "</table>"

return;

# Point of entry

app3tier().main(sys.argv[1:])


Page 32 of 37

4.4. Testing 3-tier Prototype

1) Starting & stopping the middle-tier server.

a. Go to the folder containing the middle-tier files, as shown in Figure 10, on your Unix server.

b. Run ‘./startSvr.sh’ at the command prompt to start the middle-tier server.

c. Press Ctrl+C in the Unix server terminal to stop the middle-tier server.

Figure 11. Start, execution log, and termination of the middle-tier server

2) Access the Application UI for testing.

a. Once the middle-tier server is started, open a browser session in your PC/workstation.

b. enter the following URL - http://<hostname_running_middle-tier_server>:55123/app3tier.html

c. enter values in the User Id and Client ID fields and click on ‘View Client Info’ button

d. for the User Ids (user2 & user3) belonging to CNTRY_DP, data is NOT redacted. The UI looks like Figure 12.

e. for the User Ids (user1 & user4) NOT belonging to CNTRY_DP, data is redacted. The UI looks like Figure 13.


Page 33 of 37

Figure 12: Data NOT redacted for the users belonging to CNTRY_DP


Page 34 of 37

Figure 13: Data redacted for the users NOT belonging to CNTRY_DP


Page 35 of 37

3) Verify session tagging at the database level for multiple concurrent sessions using 3-tier access.

a. Login as appmain, appro, and appsec FIDs to the database simultaneously using a SQL tool (e.g., SQL Developer,

SQL*Plus, etc.).

b. Open the ‘cgi-bin/app3tier.py’ file in an editor, uncomment the line that has the ‘raw_input("Paused here")’

function, and save the file. This will keep the python threads created by the HTTPServer running, and therefore the

corresponding database connections from the middle-tier application will stay alive so that we can view them with

a SQL tool.

c. (Optional) Although it is not necessary to stop the middle-tier server process in order for this change to take effect,

but if you want, you may stop and restart the middle-tier server process before proceeding to the next step.

d. Open a few browser sessions, enter the app UI URL, and use different user Ids in each of them. The browser UI will

render the contents properly but the terminal window of the middle-tier server process (similar to Figure 11) will

have a few error messages, which is expected as a result of the step (b) above.

e. Go to the SQL tool window of the database session established by the appsec FID, and run the following SQL

statement.

select sys_context(‘USERENV’, ‘SESSION_USER’) userid, sid, username,

client_identifier, osuser, machine, status, program

from v$session where username like 'APP%' ;

You’ll see all established database sessions originated from 2-tier and 3-tier processes with some of the 3-tier

related database sessions tagged with the redaction identifier string ‘REDACT’ (in the CLIENT_IDENTIFIER column).

The output should be similar to Figure 14.


Page 36 of 37

Figure 14:- Verification of session tagging from 3-tier access

For the test results shown in Figures 11, 12, 13 & 14, I used SQL Developer tool and IE browser from my windows login id

‘sam1’ on my PC named ‘myWin10PC’. I used a RHEL server named ‘RHEL-DevSvr’, on which the middle-tier scripts (i.e.,

.sh, .py, etc.) ran as the ‘pyusr’ user id.


Page 37 of 37

5. Conclusion

So, the outcome of these tests meets the business requirements and the technical solution design as laid out in this article.

By using a combination of Oracle Redaction and VPD features, we have addressed a few common challenges for

implementation of cross-border Data Privacy application.

This solution can be extended further to solve a number of other business problems involving sensitive data, even at

intra-country level. Using this solution approach, it is possible to identify the Oracle database sessions associated with

fine-grained application entitlement based user ids of a 3 / n tier application.

For example, the redaction identifier string can be built by the post login application code as ‘REDACT_userN_locationY’,

and then different Oracle Redaction and VPD policies can be defined based on the pattern of the substrings in the

redaction identifier string ‘REDACT_userN_locationY’. This helps establishing an end-to-end traceability of a user session

within 3 / n-tier application.

If you are also in the same boat of ‘seeing is believing’, and would like to get your hands dirty, please feel free to use the

code, configuration, data included in this article, add your own data and policies to test various use cases of your choice.

Finally, any feedback on this article is welcome!

Addressing Data Privacy Challenges In Global Applications

Technology

Transcript of Addressing Data Privacy Challenges In Global Applications