Lauri Pietarinen - What's Wrong With My Test Data

30
What's Wrong with My Test Data? Lauri Pietarinen Relational Consulting EuroSTAR 2008

Transcript of Lauri Pietarinen - What's Wrong With My Test Data

What's Wrong with My Test Data?

Lauri PietarinenRelational Consulting

EuroSTAR 2008

My Background• Tietokonepalvelu (Pension Insurance) 85-97

– Mainframe development in PLI/DL/I environment– Support department 87-95

• Maintenance of prog. environment, DB2-training etc...

• AtBusiness Communications 97-04– Internet applications– Database design, DW-implementations, Java-programming,

Project management etc...

• Relational Consulting (own company) 04 – Independent database consultant

– Specialising in test data management

• Lauri.pietarinen (at) relational-consulting.com

Customers • Finland

– Ilmarinen (Insurance)– Arek (Insurance)– TietoEnator– Area (Travel agency)– + many others…

• Sweden– BGC– Alecta– SEB

Agenda

• Why is test data management important?

• Alternatives for populating test databases

• Technical issues involved– what is needed (scope of data)?– subsetting issues– de-identifying

• Case: Pension Insurance Company in Sweden

Traditional Model

Input OutputProcess

Database Application

DA

TA

BA

SE

Program

DEV

DB2

ZOS

UNIX

WIN

ORACLE

DB2UDB

.NET

BIZTALK

MQ

SYSTEST

DB2

ZOS

UNIX

WIN

ORACLE

DB2UDB

.NET

BIZTALK

MQ

ACCEPTANCE

DB2

ZOS

UNIX

WIN

ORACLE

DB2UDB

.NET

BIZTALK

MQ

PROD

DB2

ZOS

UNIX

WIN

ORACLE

DB2UDB

.NET

BIZTALK

MQ

Database App Env

Problems with test data

• Test data is not semantically valid– errors in test programs have corrupted the database– integrity over several systems

• external interfaces!

• Test data is not comprehensive– hard to build realistic test cases

• Test data cases are consumed– Contracts terminated and people declared dead

– "You can't step into the same river twice"• Herakleitos

programs can't even be started solving errors caused by faulty data

Two Different Disciplines

TESTING DATABASES

How to Populate the Test DB?

SQL-Scripts Robot over UI(e.g. QTP)

100% COPY

5%

5%

PROD

TEST

EXTRACT

12

3

4

How to Populate Test DB?• Copy total production full volume into test

– + is comprehensive and intact– + technically simple (can be done with standard tools)– - heavy operation with big databases

– - test environment hard to use and maintain– - ad hoc updates from production not possible– - does not solve problem of consumption and corruption

• Scripts– + create non existent cases

– + only need SQL-editor– - lots of repetitive work– - go out of date

• Extract subset from production– + right data when needed– + same technology can be used to manage the subsets

– - need to build home made tools/scripts or purchase one– - expert knowledge of database structure required

How to Extract?

• Home made tools/scripts– many organisations have such tools/scripts/programs– effort needed to maintain them

• often tied to one person (who will soon be retired!)

• Generic products– DataBee (Net 2000)– Grid-Tools (Grid-Tools)– Optim/Relational Tools (IBM)– Data Express (Micro Focus)

Sample Extract

CUSTOMERSORDERS

DETAILSITEMSITEMS

Lot's of issues still remain• What is a test case?

– must define what is needed for the spesific test– customer, with orders or without?– often simpler to extract superset of tables

• Finding the right cases for your test– green haired left handed midget– maintain library of keys and/or SQL-scripts?

• Bookkeeping (is somebody else already using this case?)

• Integrity over applications– External parties– 3rd party software

Some Concepts (Optim)• Extract

– start from a set of rows in start table and extract all related rows from specified tables

– use RI or "soft relations" for navigation

• Extract File– binary format file containing extracted data

• Insert– add rows from extract file into database

• Delete– delete rows that were extracted

• Compare– compare two extracts and flag deleted, inserted and

modified rows

Extract/Insert

Extract File(binary format)

ProdDb TestDb

Cust1Cust2Cust3

Cust1Cust2Cust3

Regression Test

Prog V1

Extract

Monday

Prog V2

Tuesday

Extract

Compare

InsertInsert

Database Subsetting

Jill's test cases

Arthur's test cases

Mary's test cases

Lauri's test cases

C4

Subsetting Scenario

Test database

C4

C2 C5

C3C7C8

C9

PROGRAM

4 Compare2 Run Program

3 Extract after

Compare

1 Extract before

5 Delete

6 Insert original

C4C4

Impact on Program/DB Design• Batch programs should be able to operate on subsets

of cases– so as not to consume and disturb the whole database!– external parameters (e.g. list of customers) or other

indicators– new columns/tables in database for subsetting?

• Soft Date– Don't get date from the system, give it as a parameter

• Choice of indentifiers– surrogate keys/logical keys

• How identifiers are generated– surrogate table– sequence– select max(key)+1 from table

Case: Company X

• X is a Swedish insurance company that specialises in Labour Pension Insurances

• X recently renewed nearly all of it's application portfolio– Billing, payouts, insurance, DW, actuary,

extranet...– Went live April 1st 2008

• Large project with a budget of about 100M€– development time 6 years– up to 150 persons involved in the project

Case: X

• Technical platforms: >5• Kinds of DBMS's: 4• Number of databases: ~20• Number of tables: >1200• Number of integration interfaces: ~100• Number of batches: 150• Number of online dialogs: 100• Number of test cases: > 1400

Case: X

PROD

ACCEPTANCEDEV

SYSTEST 100%

TEST DATADB

5%

Fast Track /1-10 at a time

5%5%

Take/restore snapshot

QTP

Case: X

Optim EXTRACT Process Report

Extract File : K87376.TDSRES.B54.S004.EAF.SEXT.XFAccess Definition : TDS.EAF.EXTRACTCreated by : Job K87376, using SQLID K87376 on DB2 Subsystem DB2PTime Started : 2008-03-26 08.21.27Time Finished : 2008-03-26 08.48.37

Process Options: Process Mode : Batch Retrieve Data using : DB2 Limit Extract Rows : 40000000 RowList : 'K87376.TDSRES.B54.S004.EAF.SEXT.PNS'

Total Number of Extract Tables : 112

Total Number of Extracted Rows : 6676868

Total Number of First Pass Start Table Rows : 117172

5% of a total of 2M persons

X: Life Cycle Tests

• Test environment was loaded with one person at a time– from a set of about 30 persons with different

profiles

• 10 months worth of batches were run at the rate of about 7 min/month– Batches used "soft date" to simulate time flow

• Before/after compares were made on the database

De-identifying Sensitive Data

• Tightening regulation

• Outsourcing– providing your contractors with good test data is

essential– however, security issues become important

De-identifying Issues

• How to de-identify– use algorithm to create new id (soc.security nr)– create a random id and save in lookup table

• always use the same?

– use a random lookup table for names

• Issues– propagating changes to foreign keys– introducing company wide schemes– introducing extra company schemes

Create a TDM-System

• Wrapping it up by building a Test Data Management System– process for copying data from one environment to

the other– automated system to minimize manual work

involved– Imbed bookkeeping and deidentifying– Auditing and statistics "for free"

Sample Excel for Ordering Data

Test Data Management

Plan Centralize Use a tool Test data management must be

an integral part of application testing and development and not an afterthought