Future proofing your IM investment; insure against...
Transcript of Future proofing your IM investment; insure against...
Future proofing your IM investment; insure against business change
A member firm of Ernst & Young Global Limited Liability limited by a scheme approved under Professional Standards Legislation
2 of 13
Table of Contents Today’s Reality .................................................................................................................................. 3
Data Warehouse Architecture Tiers ................................................................................................ 3
Where Change is Inevitable ........................................................................................................ 3
Data Warehouse Design ‘Gurus’ .................................................................................................... 3
Factoring in Change ................................................................................................................... 3
Phase 0 - Signs of Life ...................................................................................................................... 5
Same Technique Regardless of Warehouse Maturity ..................................................................... 5
SDLC/Waterfall Versus Agile Approach .......................................................................................... 5
The Agile Architecture Approach .................................................................................................... 5
The Data Vault ........................................................................................................................... 6
Anchor Modelling........................................................................................................................ 6
Defining a Common Organisational Model...................................................................................... 6
Step 1 7
Step 2 7
Step 3 7
Phase 1 - The New Born ................................................................................................................... 8
Remember ..................................................................................................................................... 9
Phase 1+ - Growing up .................................................................................................................... 10
Our Snapshot Recommendation ...................................................................................................... 11
A member firm of Ernst & Young Global Limited Liability limited by a scheme approved under Professional Standards Legislation
3 of 13
Today’s Reality
Why do people build Data Warehouses based only on how things are now, without the inherent ability to
adapt or change?
Data Warehouse Architecture Tiers
The predominant data warehouse architecture built today is based around three key tiers:
(i) Get the data from the source systems & providers;
(ii) Combine or integrate the data, apply quality checks and calculate new metrics; and
(iii) Format the data to make reporting easy and efficient.
Where Change is Inevitable
Each of these three tiers is subject to change. For instance:
Tier 1. Old legacy systems are replaced with different or newer ones;
Tier 2. The business itself is constantly changing to improve and maximize its worth; and
Tier 3. Users of the data require more information and insight to make their decisions.
Data Warehouse Design ‘Gurus’
Typically, data warehouse design is based on one of three ‘gurus’; Bill Inmon (Inmon), Ralph Kimball
(Kimball) or Dan Linstedt (Linstedt). The contents and design of tiers two and three can therefore be
different.
Each ‘guru’ has his followers and provokes passionate debate regarding the strengths and weaknesses of
these approaches.
What is commonly overlooked however is that a data warehouse goes through phases of evolution and
depending on the phase, the development approach or methodology chosen has more impact on future
proofing than the specific design construct chosen.
Factoring in Change
Ralph Hughes, Chief Systems Architect at Ceregenics, has published several books on the subject of agile
data warehousing. We have found that this methodology is most effective for data warehouse development
as it actively factors in change.
You must be able to adjust the design construct throughout the evolution of the data warehouse to handle
the changing requirements and this can be challenging.
A critical factor in any data warehouse design and build iteration, is the ability to model the solution so that
you can review and modify the design. Some data modelers assume that there can only be one “right”
model. The extensive research behind Graeme Simsion’s book Data Modeling Theory and Practice
(Simsion, 2007) concludes that there can be multiple models, each with relative merits when judged
according to various factors, one of which is ability to adapt to change. The assembly of several candidate
models facilitates the explicit evaluation of options, and provides a critical communications asset ensuring
that everyone involved has the same picture in their head of what is being built and why.
A member firm of Ernst & Young Global Limited Liability limited by a scheme approved under Professional Standards Legislation
4 of 13
Publications from renowned authors like Scott Ambler (Ambler, 2003) (Ambler, 2004) (Collier, 2011), John
Giles (Giles, 2011) and Len Silverston (Silverston, 2009) (Silverston, 2012) cover the subject of Data
Modelling using an Agile methodology.
We will now explain the design constructs used at specific stages during the evolution of a data warehouse
and the specific techniques that insure your investment against inevitable future change to your underlying
data structures.
A member firm of Ernst & Young Global Limited Liability limited by a scheme approved under Professional Standards Legislation
5 of 13
Phase 0 - Signs of Life
A data warehouse starts life by looking at the individuals and groups within organizations collecting data and
information to produce reports to make decisions.
Over time, some of these individuals connect up into localized workgroups or streams within a company.
They compare what each is doing and adapt to generate a consistent approach. This goes on throughout a
company, most often, with each group unaware of what others are doing.
This situation can also exist even when a very mature data warehouse is operating.
Same Technique Regardless of Warehouse Maturity
The technique remains the same for both a new data warehouse or for an existing, mature environment even
though the collective knowledge and experience might be at different levels.
You must:
1. Establish what is driving the effort;
2. Work out where the information is coming from and in what form;
3. Establish the dynamics, relationships and construction of the data;
4. Define what checks need to be applied to the incoming data;
5. Define if, where or how to store the data;
6. Define what value add or derived information needs to added;
7. Establish what form the output or report needs to be in; and
8. Build something and check with the owner if it’s okay.
SDLC/Waterfall versus Agile Approach
A SDLC or waterfall approach is likely to involve doing each of the above in sequence and in full, prior to
moving on to the next step.
An agile approach would cycle through the steps a number of times potentially, refining and enhancing each
time, based on feedback from the person who will be the eventual user of the work, known in agile terms as
“the product owner”.
Both methods will be presented with the same issue as each of the steps above can result in change when
revisited at a future date:
How can I (or should I) design what I’m doing now to be useful in the future and require minimal
effort if something changes later on?
The Agile Architecture Approach
The ideal Agile architecture approach when presented with new, unquantified requirements is to initially
adopt the Kimball method.
Establish a source of the data;
Turn it into a star schema;
Prototype a report; and
Get feedback from the owner.
A member firm of Ernst & Young Global Limited Liability limited by a scheme approved under Professional Standards Legislation
6 of 13
This approach could be adopted for several iterations, reworking the schema and metrics. The same
approach could be taken when the next stream or group is identified who need help generate the right
reports.
Essentially, you are duplicating the design with different context. Initially this approach works fine, however,
as more information is introduced and more changes made to the reporting requirements, this starts to
become a lot of effort, rework and can lead to confusion about what is current and what is old star schema
design.
What does evolve from this process however, is:
An understanding of the types of data required for the reporting (dimensions),
The dynamics or relationships between the data and the metrics or measures (facts); and
The firsthand experience with the quality of the data.
The Data Vault
Dan Linstedt came up with a construct called a Data Vault in 2000. Not necessarily a truly representative
name, as it implies the data is all locked up (which it isn’t), but something akin to generalization, for the
modelling types among you.
It adopts the following basic pattern:
There are things (hubs);
These things have relationships with other things and even themselves (links); and
These things have their own attributes or qualities (satellites).
There are also advanced features like PITs, Bridges and User Groups.
Anchor Modelling
A similar construct was implemented in Sweden in 2004 and subsequently published by Lars Rönnbäck from
The Data Warehousing Institute in 2007.
“Anchor Modelling is an Agile information modelling technique that offers non-destructive
extensibility mechanisms enabling robust and flexible management of changes. A key benefit of
Anchor Modelling is that changes in a data warehouse environment only require extensions, not
modifications.” (Rönnbäck, 2011)
Adopting a sixth normal form (6NF) approach, the concept of Anchor Modelling was introduced.
Anchor Modelling has the following pattern:
Things and events (anchors)
Properties of the things or events (attributes)
Relationships between things (ties)
Shared or common properties between things (knots).
Defining a Common Organizational Model
So with these basic building blocks, it is possible to model the business data from the bottom up and at the
same time introduce some common generic patterns and themes. This approach is now more aligned with
A member firm of Ernst & Young Global Limited Liability limited by a scheme approved under Professional Standards Legislation
7 of 13
the Inmon philosophy. In fact, these building blocks provide the ideal platform to start defining a common
model for the whole organization.
Here’s how.
Step 1
Establish what the various information types are: customer, product, order, policy, account, risk, asset, etc.
These can be aligned to the dimensions created previously.
Step 2
Establish where any of these are used together, linked, have a dependency, or a cross-reference. The same
pair of information types can have more than one of these links.
Step 3
Take all this and create a conceptual model. Some basic examples are shown below illustrating different
notations and different business subject domains.
A member firm of Ernst & Young Global Limited Liability limited by a scheme approved under Professional Standards Legislation
8 of 13
Phase 1 - The New Born
Once you have accepted that change is inevitable and that the data warehouse will evolve, the focus should
be on constructing your solution so that when change happens, the impact to existing reports, code and
processes is minimal or non-existent.
The aim is to separate the relationships from the data and the data from the keys.
Traditional 3NF binds the keys and relationships in with all the data. Add a relationship, change a key,
remove an existing relationship, all can cause significant rebuilding of the underlying model, database, data
loads and reports.
Using internal (or surrogate keys) to provide the unique reference instead, allows greater flexibility in the
design and minimizes or eliminates any impact when things change.
Enhancing the sample conceptual models shown previously, the logical model now begins to evolve.
A member firm of Ernst & Young Global Limited Liability limited by a scheme approved under Professional Standards Legislation
9 of 13
Remember
Naming conventions are important when building out the model.
Typically relationships, should be prefixed with the same identifier. In the example above this is simply
“Rel”.
The business entity attributes/details/properties are named the same as their related business entity,
except with a common suffix. In the example above this is simply “Details”.
Not everyone needing to investigate this area of the data warehouse will have access to a modelling tool.
Some might be using a database query tool, others reporting products. These will typically present the user
with just a list of tables or objects.
By adopting naming standards, it will make the identification of the type of entity/table/object easier.
A member firm of Ernst & Young Global Limited Liability limited by a scheme approved under Professional Standards Legislation
10 of 13
Phase 1+ - Growing up
Adapting to change doesn’t necessarily involve major rework or adjustments.
By adopting the basic constructs explained above, changes to the model are generally additions.
Consequently, existing code and constructs will continue to work without requiring any remedial action.
Obviously if they intend to make use of the new changes, then they will need to be enhanced.
Making consistent modifications to the model is quite critical. Keep to the naming standards to ensure ease
of use.
During this phase, master and reference data sets begin to evolve. In Anchor modelling, these are known as
knots. Consistent and unambiguous definitions of entities and their associated attributes and relationships
are very important.
The approach to enhancing the data warehouse remains the same, with requirements coming in from the
user community; restrictions or mandates from the source systems; and the architects and those responsible
for standards trying to consistently join the two ends of the data warehouse together: “Data In and Data Out”.
A member firm of Ernst & Young Global Limited Liability limited by a scheme approved under Professional Standards Legislation
11 of 13
Our Snapshot Recommendation
In the simplest terms, adopt:
An agile delivery approach for report and data mart design. This will help ensure the ongoing retention of
users and product owners
A disciplined waterfall approach to the formal data acquisition process from source systems to assist with
reliable and accurate provision of data.
A hybrid of the two approaches in the middle.
Following this approach, will ensure minimal rework and down time associated with the inevitable changes to
the underlying data structure.
A member firm of Ernst & Young Global Limited Liability limited by a scheme approved under Professional Standards Legislation
12 of 13
Bibliography
Ambler, S. (2003). Agile database techniques : effective strategies for the agile software developer. Wiley
Publishing.
Ambler, S. (2004). The Object Primer: Agile Model Driven Development with UML 2. Cambridge University
Press.
Collier, B. (2011, June 22). Agile Data Modeling: Evolving Toward Excellence. Retrieved from TDWI:
http://tdwi.org/articles/2011/06/22/agile-data-modeling.aspx
Giles, J. (2011). The Nimble Elephant. Amazon.
Inmon, B. (n.d.). Retrieved from Bill Inmon - Corporate Information Factory: http://www.inmoncif.com/home/
Kimball, R. (n.d.). Retrieved from Kimball Group: http://www.kimballgroup.com/about-kimball-group/
Linstedt, D. (n.d.). Retrieved from Dan Linstedt - Data Vault: http://danlinstedt.com/about/data-vault-basics/
Rönnbäck, b. L. (2011, May). Anchor Modelling with Bi-Temporal Data. Retrieved from Anchor Modelling:
http://www.anchormodeling.com/wp-content/uploads/2011/05/Anchor-Modeling-with-Bitemporal-
Data.pdf
Silverston, L. (2009). The Data Model Resource Book: Universal Patterns for Data Modeling. John Wiley &
Sons Inc.
Silverston, L. (2012, Feb). The Las Vegas 2012 Conference . Retrieved from TDWI World Conference
Series: http://events.tdwi.org/events/las-vegas-world-conference-
2012/Speakers/Speaker%20Window.aspx?SpeakerId=%7B75239155-BDF6-4082-8D0E-
C5403A9E72BD%7D&ID=%7B1DF417B1-CDE2-4924-9752-BC0610A76F46%7D
Simsion, G. (2007). Data Modeling Theory and Practice. Technics Publications.
A member firm of Ernst & Young Global Limited Liability limited by a scheme approved under Professional Standards Legislation
13 of 13
EY | Assurance | Tax | Transactions | Advisory
About EY
EY is a global leader in assurance, tax, transaction and
advisory services. The insights and quality services we
deliver help build trust and confidence in the capital markets
and in economies the world over. We develop outstanding
leaders who team to deliver on our promises to all of our
stakeholders. In so doing, we play a critical role in building a
better working world for our people, for our clients and for
our communities.
EY refers to the global organization, and may refer to one or
more, of the member firms of Ernst & Young Global Limited,
each of which is a separate legal entity. Ernst & Young
Global Limited, a UK company limited by guarantee, does
not provide services to clients. For more information about
our organization, please visit ey.com.
© 2016 Ernst & Young Australia.
All Rights Reserved.
This communication provides general information which is current at
the time of production. The information contained in this
communication does not constitute advice and should not be relied
on as such. Professional advice should be sought prior to any action
being taken in reliance on any of the information. Ernst & Young
disclaims all responsibility and liability (including, without limitation,
for any direct or indirect or consequential costs, loss or damage or
loss of profits) arising from anything done or omitted to be done by
any party in reliance, whether wholly or partially, on any of the
information. Any party that relies on the information does so at its
own risk. Liability limited by a scheme approved under Professional Standards Legislation.
eyc3.com
ey.com/analytics
Contact details:
EYC3 creates intelligent client organizations using data & advanced analytics.
Our team of data scientists, analysts,
developers, business consultants and
industry experts work with clients at all stages of their information evolution.