Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with...

24
Agile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger Senior Director eBay TERADATA Raising Intelligence Adastra Information Management Conference 2009 The Carlu. Toronto. Canada April 23. 2009 Agile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation stephen. [email protected] Oliver.Ratzesberger Senior Director eBay oratzesberger@ebay .com TERADATA Raising Intelligence

Transcript of Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with...

Page 1: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

Agile Data Warehousingwith Production Sandboxing

Stephen BrobstChief Technology Officer

Teradata Corporation

Oliver RatzesbergerSenior Director

eBay

TERADATARaising Intelligence

Adastra Information Management Conference 2009

The Carlu. Toronto. Canada

April 23. 2009

Agile Data Warehousingwith Production Sandboxing

Stephen BrobstChief Technology OfficerTeradata Corporationstephen. [email protected]

Oliver.RatzesbergerSenior Director

eBayoratzesberger@ebay .com

TERADATARaising Intelligence

Page 2: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

Agenda

• The Business Need

• Implementation Options• Governance

• eBay Case Study• Discussion

The Business Need

TheBusi ness

Need

fF..RADATA•.....•-

_ eb_-'_

TfAADATA•.....•-.

2

Page 3: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

Sustaining Success in Data Warehousing

The biggest enemy to sustained success in datawarehousing is stagnation.

Continued delivery of value from a data warehousedemands that organizations aggressively encouragenew and creative methods in their use ofinformation and analytics.

If an organization does not evolve and improve itsdata warehouse, value will diminish over time.

TERADATA•...•"'-

Value of a New Analytic Capability

InitialDeployment

RepeatedUse

Value

Time

TERADATA--.

3

Page 4: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

Value of a New Analytic Capability

When a report or analytic capability is first introduced, itoffers the potential for new insight and ways of exploitinginformation.

• Value initially increases as adoption of the new capability takes place .

• As time goes on those once groundbreaking insights become old news.

• The value of the report or analytic capability begins to decrease overtime after the point where the organization has incorporated the insightsinto its standard operating procedures.

Over time, the data warehouse will be burdened withgenerating lots and lots of reports that eventually delivervalue that is less than the cost of maintaining them.

TFRADATA--The Requirement for Innovation

Organizations must engage in two actions toensure sustained success of a data warehouse:

1. Constantly innovate and develop new capabilities.

2. Prune older capabilities where the value no longerjustifies the cost.

It is not enough to encourage and sponsorinnovation ...organizations must provide anappropriate platform and environment tofacilitate innovation in the area of informationexploitation.

TE RADATA--4

Page 5: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

Use Case eb._

Marketing has an outside source of datathat it wants to bring into the datawarehouse, but they are not yet sure if ithas high value or not ... so we need toexperiment with the data before signingthe purchase contract.

TERADATA---

Rigorous Solution Methodology

C(Strategy

•Opportunity

Assessment

EnterpriseAssessment

10

(Research

==~rojecf Manc(gemerfiAna yze Design Equip Build

Application SystemRequirement Architecture

Logical I I PackageModel Adaptation

DataMapping

nfrastructur,& Education

UserCurriculum

--.a... ~,-- -~--

( 0Intearate Manage

TERADATA--5

Page 6: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

11

Rigorous Solutions Methodology

Takes too long!

eb'

TERADATA--Agile Data Warehousing _____ eb~

12

The Manifesto for Agile Software Development puts forththe following principles:

• Value individuals and interactions over processes and tools,

• Value working software over comprehensive documentation,• Value customer collaboration over contract negotiation, and• Value response to change over following a plan.

While it is recognized that there is value in the items onthe right, the items on the left are valued even more inthe agile development methodology.

TEAADATA--6

Page 7: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

Agile Data Warehousing

13

The underlying philosophy that drives these valuesin the context of data warehousing is to put thehighest priority on satisfying end user(knowledge worker) requirements through earlyand continuous delivery of analytic capability.

This does not mean that requirements documents,design documents, entity-relationship diagrams,data dictionaries, etc. are not important - but itdoes mean that the emphasis is much more ondelivery than process.

TERADATA----

Agile Data Warehousing

Goal: Same dayavailability of data into

the analytic environment.

TfRADATA--7

Page 8: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

15

Agile Data Warehousing

Need a way to get data into the data warehousewithout the overhead of a full blown developmentmethodology:

• Allow for "load and go" analytics.

• Non-certified content to be used in cooperation withcontent in the enterprise data warehouse (EDW).

• Limited users and limited use.

TERADATA---

Implementation Options

16

ImplementationOptions

TEiYtDATA---

8

Page 9: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

Option 1:Separate Developmel!.t S_ystem

• Deploy the "experimental" data in thedevelopment/test environment.

_ eb_

17

• Configure the development/test environmentto the full size of the production environment.

• Demonstrate the value of the data and then(assuming positive ROI) use best practices tobring data into production OW.

TERADATA--Option 1:Separate Development System

11

Objection: Costs too much tohave development/testenvironment at full size and keptfully up-to-date with productionenvironment.

TfRADATA--9

Page 10: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

Option 2:Downsized Development System----- --------

• Deploy the "experimental" data in thedevelopment/test environment.

• Configure the development/test environmentwith "sampling" from the productionenvironment.

• Demonstrate the value of the data and then(assuming positive ROI) use best practices tobring data into production OW.

TERADATA-"-"

Option 2:Downsized Development System

Objection: More difficult toimplement prototype withsampled data sets and moredifficult to make ROI case.

TERADATA--2.

10

Page 11: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

Option 3:Federated Development System- ------ --- -- - ---

• Deploy the "experimental" data in thedevelopment/test environment.

• Join across the development and productionenvironments to perform the analysis.

• Demonstrate the value of the data and then(assuming positive ROI) use best practices tobring data into production OW.

TERADATA--21

Option 3:Federated Development System

Objection: Performance suckswhen joining large data setsacross a network (even when bothsystems are Teradata!).

TERADATA--22

11

Page 12: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

Option 4:Production Sandbox

"

• Deploy the "experimental" data directly into theproduction environment.

• Separate Teradata "databases" for sandbox datawith joins allowed to the production data all on asingle system.

• Demonstrate the value of the data and then(assuming positive ROI) use best practices tobring data into production DW.

TERAOAIA----

Option 4:Production Sandbox

"

Objection: End users managetheir own space and have createtable privileges on the productionsystem.

TERAOA1A--12

Page 13: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

Option 4:Production Sandbox

25

Objection: End users managetheir own space and have createtable privileges on the productionsystem.

TERADATA•••••••• 1It1•••••••••

Governance

2.

Governance

TERADATA-"-

13

Page 14: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

Controls

• No "production" reportingfrom the sandboxenvironment.

• Automated resource governorsto prevent "runaway" queriesin the sandbox area.

• Data residency no more tha nXX days.

TERADATA--27

Promoting Content into the EDW eb"' r

• Monitoring and "promoting" data content in thesandbox.

• Once the value is proven, use "proper"methodologies to integrate content into theenterprise data warehouse.

• Encourage refinement of requirements as part ofthe sandbox experience.

TfRADATA--"

14

Page 15: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

Case Study

The eBayExperience

2.TERADATA--

eBay Analytics Technology Highlights

>50 TB/day of new, incremental data>1 OOk data elements

>50A10 new records/day

>50k chains of logic

>5000 business users & analystsActive/Active

turning over a TB every 5 seconds

30

24X7X365Always online Millions of queries/day

99.9+% Availability

Near-Real-time TERADATA--15

Page 16: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

eBay Analytics Core

MicroStrategy Unica Crystal SAS SOL

PrimarySecondary

RelationalDlta

MPPr:Relatlona' Data

Teradata

2.SPB2.2PBT••.•d.ta

Linux Linux

L.o~llnt.rconn.ctLoQIIl'lterconll«t

Wid.""".!nterconne<:r:1000 mllu

Sun Fir. <txxx

2.2PBXML, nam_tvalue, raw

Phoenix, A2

Solarl5

MPP/HPC/Grid

Sot.ns

6.6PB

"

Dlta JnteOr8tionB Informatica

TfRAOATA--Design for the Unknown--->85% of ebay analytical workload is NEW &Unknown

Exploration is the core of an analytical company

~; The metrics ou know are 'chea ' ~~1: ~IThe metrics you don't know are expensive but also high ~I in potential ROI IIDesig~ can.'t be static or dependent on specific questions Ior dimensions

j ~L.r~ ------ -- l'

16

Page 17: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

"We Need Data Marts!"

WlltKh'Ui

0.1.11 iIIII¥!'Mu$e Mfh ~ tiN m¥h (i,t "~ub and ~f'1~M-nC dat. miffs ~nt c.ttI\ls.ltnt de\l&tt1

CtRtrll~a'll"""-'JSttnJy

·Cordonaed·d~l. INrts (tonr.lUeIfIl ~ftl

y.tu(tIr1aI •• warttlouw(l.t IMIfl,cL1ladJINmallylromSQIfLt ~~-\It".I1\whffi 1W'f'dM)

(1'*It.M\y,f'ftd1

..,Respthleols

90

10

JI

!J

1

17

Page 18: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

Data Mart Dilemma

Total Cost of Ownership (TCO)Fully loaded cost staggering $500k++Biggest drivers are

Maintaining separate databasesweekly/daily/hourly data transfers

Data inconsistencies

Data redundancyIncreased complexity

Loss of lineage over time

Analytics as a Service

-

. ,,~- .'.'., " . ' . ';~ ""':<i~~"~.:_' ,,(,.;..::: :" • :, - ••••• .'-

~•. , •••• ,;;,:;iii?;; •• ' •• ~.~ ,Iio-.. , ~

_"'*~ "'J! ;.-",.t-~~~ ;;¥'~" i~" ~-?~

Massive scale Analytical ,=,tili~yComputing

Bring your data - Perform your Analytics

From Simple Web~~~sgd ~afa uploj3d "

jJ, d- dlCI ~ ,.. ; •• to fully p'rivate Utility accessCombine ~~~a:d c:ode~Wltti ALL existing data

~v~- IC:I. ce.

~­I

36 TERADATA--18

Page 19: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

Analytics as a Service dJ_o· . ~.

ma'-!-•.....•... ,-..•••••••••••••••••••••• ,•••• t••••••• s...,.t c-... ..•

I--- cc:::xDt __,-..-.•..-•..­

.•••••-11II.-'" •••, •••

w~••• __ •

-'1. ,_"",-c::::r:-~.' ---..

CCII!-.1

III::!C:II!'."':.I&.:.A.-::~::.-•.•..."'"

---­s..-._.,_"_n_----- ..--"

37

" __ L_"... --- ,_ ..

,. __ '00­-•.....,--,."- .•. ,--'--

••.•a._--•..-..,

a:::I...-....•­---.....",t-. ....• I

...

Analytics as a Service

From a simple web based table upload:

CM:An: 1MU: ••••

rtom you< C"iT! nm SOL ••••

IITtIot""'o::MAT[TAfU SCl..hf"'~"'"'kI~.,..,........••• p.on:;"~,,-1o•• ~W\twllU)"",,,_,--ctMfe~LW_~top~_\JI

I

(

I"-*"-.P,.,., .

~_ •• o.a.Tt)I..MU:PfIIM.t.IItYt()f)I( __ ...DI...,:Cll'lUJ'OU ••••••.••.••MId.~.,. •• toAIon ••• ~ ••••••••••••b::IIIOIfIOI'''tonn.oru'\''DUIfI!II~ •• OII:·h~r"""tdIon ••••~_~2:ltwtoloowlJ_noI~~.¥OIeOe,fotoIIIfetI1ItIrlWy•..... I -I & •••••• In.slllhellec:or'teMg •.•••• lobt~n._ .........•"',.....•....._byo:;oMrM(J.-.d_~_-..db'f'''\1".N.U"'f~ ••~..,.._t-~12~)OO7-Oi ••:WS81WJ,2IXJB.t2-01.987$112.'~$,20Q8.t2.ot~;hJNIbc*~e •• prv0U;40-"''*''''~" CSV~(~ __ 8kId-- -,

..- ~ -n..--. you"'••••••••••to. elf''''''t •••••• __ Bv* ••••••twll4Jk:>8dp'w"'1I'-v~ __ I~")--- ~- -••.,....-"'dD.'•.•. h •••..••••.•.••.•••_•••w~·J_)f~_'_JU'QIt<I-

'''''-''',r

38 TfRADATA--19

Page 20: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

...to fully private utility access

We call them PET (Prototyping Environment in Teradata)

More than 75 active right now

In most cases they are small (100GB-5TB)

since all the main data is already in the EDW

They are free to the business units

3.

Analytics as a Service: Benefits

Improved Time To Market - DaYS/Weeks versus Months.

Enable the business to do agile prototyping.

"F -I F t" .Enable the users to al as - Make It easy

to try out new ideas.

Eliminate stray Data Marts.

J II

20

Page 21: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

Governance Rule 1 ----~- -

• Keep the production data clean.• The data life cycle methodology is there for a reason.e Do not "pollute" production data with data of unknown

source and validation.• Equivalent to a viral injection ...and you may not recover.

• Do not inject prototype data into "core" DW data:• Data ingest (ETLjELT) does NOT have access to sandbox.

• Not even to populate the sandbox.

• Strictly and conceptually enforced on both Batch and Useraccounts.

"

Governance Rule 2

TERADATA-"-

g-• Prototypes written by experienced personnel:

• PETs assigned to NAMED personnel.• Previous Experience and Training Required.

• Prototype personnel are typically former DWdevelopers who transitioned into a business unit.• Speed of implementation.

• Knowledge of DW processes and methodologies.• Knowledge of data.

.,TERADATA

21

Page 22: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

Governance Rule 3

• Sunset dates must be applied:Hold a post mortem.

• Retire it or promote it.

• The prototype must not become a "black market"production application.

Business cannot depend on them.

DW cannot give them appropriate support.

TERADATA--Key Process 1

Pre-defined methods, templates, and rules for setupand teardown :

• Well-defined rules for usage.• Defined, named owners.• Pre-defined security templates.• Pre-defined Help Desk responses to add/drop users.

TERAOATA--..

22

Page 23: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

Key Process 2

Help Desk support is critical.

• Direct access for PETpersonnel to the most seniorarchitectural and technical personnel.

• "Bidirectional" mentoring:• Best and brightest technical resources get closer to the business ...• Business gets closer to fast and effective implementations.

• It does not take long for PET personnel to become self-sufficient.

TERADAfA----45

Key Learning --~~~

• In-place processes enable "time-to-market" benefits.• Put the processes and security in place first.

• Failure = Learning• Do so with great effectiveness ...• Fail fast, fail early.

• Most business units now maintain a permanentsandbox.• Complex analysis and decision making within a business day!

Tf RAD,.\TA--4.

23

Page 24: Agile Data Warehousing with Production Sandboxing · PDF fileAgile Data Warehousing with Production Sandboxing Stephen Brobst Chief Technology Officer Teradata Corporation Oliver Ratzesberger

'7

Questions?

TERADA1A--References

••

[ 1] Higgins, D. Don't Just Tread Water. TeradataMagazine. Volume 8, Number 1. 2008. pp. 23-25.

[ 2] Brobst,S., M. McIntire, and E. Rado. Agile DataWarehousing with Integrated Sandboxing. The BIJournal. First Quarter, 2008.

[ 3] Beck, K., M. Beedle, A. van Bennekum, A. Cockburn,W. Cunningham, M. Fowler, J. Grenning, J. Highsmith, A.Hunt, R. Jeffries, J. Kern, B. Marick, R. Martin, S.Mellor, K. Schwaber, J. Sutherland, and D. Thomas. TheManifesto for Agile Software Development. February,2001.

rFR,\Df\Tt\--24