David C. Hay I . . . - Essential Strategies
Transcript of David C. Hay I . . . - Essential Strategies
Essential Strategies, Inc. – 1 – What Data Models Can’t DoCopyright © Essential Strategies, Inc.
WHAT DATA MODELS CAN’T DO
David C. Hay
I USED TO THINK . . .
Your author discovered data models in theearly 1980’s, without recognizing that theywere anything other than a convenient wayof describing a data base structure. Then, inthe mid-1980's, he encountered Oracle'sapproach and discovered them to be verypowerful in their ability to describe thestructure and important aspects of abusiness. They proved to be much moreexpressive and useful as an analysis toolthan data flow diagrams or anything elsethat had been available.
He got in trouble with developers, though,by thinking thatt analysis was finished whenthe data models were done. The developers,it turns out, still had to do a second analysisto find out what the business rules were.When are occurrences of this created? Whatconditions had to be met? How do youdetermine which account to charge? And soforth.
The data model clearly isn't enough. What itlacks is a formal way of describing thebusiness rules that control the business (andtherefore the model).
The information industry is beginning toformalize this shortcoming. The IBM UserGroup, GUIDE has established a project toarticulate the requirements and possibleapproaches to specifying business rules.Ron Ross has written a comprehensive book
on the subject.1 Barbara von Halle haswritten a series of three articles forDatabase Programming and Design.2
Oracle UK is doing research to determinehow formal documentation of business rulesmight be incorporated into future CASEtools.
This paper will describe some of theshortcomings of data modeling in describingbusiness rules, and present some approachesto making up for these shortcomings.
PROBLEMS WITH DATA MODELS
A data model for the most part onlydescribes the structure of information. Itsays very little about how that informationcan or must be used. Some rules areincluded: any “must be” relationship is arule. Definitions are rules. The fact that anoccurrence of a sub-type must be anoccurrence of a super-type is a rule. That’sabout it.
1 Ronald G. Ross, The Business Rule
Book, (Boston:Database ResearchGroup, Inc., 1994).
2 Barbara von Halle, DatabaseProgramming and Design, “Back to theBusiness Rule Basics”, October, 1994,“Living by the Rules”, November, 1994,“Lessons to Learn from Tee-ball”,December, 1994.
Essential Strategies, Inc. – 2 – What Data Models Can’t DoCopyright © Essential Strategies, Inc.
There are at least four categories of thingsthat cannot adequately be described in datamodels:
ü Implied assumptions
ü When Optional relationships are activated
ü How to keep multiple paths between entitiesconsistent
ü Derivations
These are described further, below.
Implied assumptions
Some business rules are so obvious, that wejust assume they have been asserted, whenin fact they have not. Figure 1. for example,shows a simple case of recursion.
The model says that each ORGANIZATION
may be composed of one or moreORGANIZATIONS, and that eachORGANIZATION may be part of one and onlyone ORGANIZATION. What it does not say,however, is that there is any restriction onthe ability to specify that an ORGANIZATION
may be part of itself. That is, you could,under the terms of this model, say thatDenver South High School is part ofDenver South High School. More subtly,you could say that Denver South HighSchool is part of the Denver School District,which in turn is part of Denver South HighSchool.
ORGANIZATIONpart of
composed of
Figure 1: RecursionAssumption3
The model does not prevent such loops.
Related to this is the example shown inFigure 2.
This model expands on the previous one,adding the notion that an ORGANIZATION isan example of an ORGANIZATION TYPE. Thatis, “the shipping department” is an exampleof a “department”, while “Coca Cola” is anexample of a “corporation”. Just as anORGANIZATION may be composed of one ormore ORGANIZATIONS, as stated above, sotoo an ORGANIZATION TYPE may becomposed of one or more ORGANIZATION
TYPES.
What assumed but not stated, however, isthat the two hierarchies are consistent witheach other. We assume, for example, that if“the shipping department” is anembodiment of the ORGANIZATION TYPE
3 The models in this paper are from:
David C. Hay, Data Model Patterns:Conventions of Thought, (New York:Dorset House Publishers, 1996)
Essential Strategies, Inc. – 3 – What Data Models Can’t DoCopyright © Essential Strategies, Inc.
“department”, and that a “department” isalways part of a CORPORATION and alwayscomposed of one or more “sections”, then“shipping department” will be part of“Coca Cola” (a “corporation”) and not theother way around. That is, the assumed ruleis that the structure of ORGANIZATION TYPE
is consistent with the structure ofORGANIZATION.
But that rule cannot be stated in our model.
Optional relationships don't tell youwhen
Another shortcoming of data models when itcomes to describing business rules is theinadequacy of optional relationships. Thesesay that an occurrence of one entity may berelated to one or more occurrences ofanother, but they do not say when and underwhat circumstances
ORGANIZATIONpart of
composed of
ORGANIZATIONTYPE
part of
composed of
an example of
embodiedin
Figure 2: Related Recursions
For example, Figure 3, a physical ASSET
may be accounted for in a financial ASSET
ACCOUNT. The question is, when? While itis typical for the accounting to lag a while
Essential Strategies, Inc. – 4 – What Data Models Can’t DoCopyright © Essential Strategies, Inc.
after the acquisition of the ASSET, there isprobably a policy which says that (forexample) within one calendar month of its
acquisition, each ASSET must be accountedfor in one and only one ASSET ACCOUNT.This cannot be shown here.
ASSET
. Actual unit cost
accounted for in
. Balance
ASSETACCOUNT
ACCOUNT
LIABILITYACCOUNT
EQUITY ACCOUNTDISCRETEITEM
INVENTORY an accounting
of
Figure 3: Asset Usage
The same problem, but with a differenttwist, is shown in Figure 4. Here, normally,a WORK ORDER is carried out via one ormore ACTIVITIES. Similarly, most of thetime, an ACTIVITY is authorized by a WORK
ORDER. The problem is that under certaincircumstances, an ACTIVITY may be carriedout that is not authorized by a WORK
ORDER. Under what circumstances is thatpermitted? The model doesn’t say.
ACTIVITY authorized by
WORK ORDERcarriedout via
Figure 4: Activities and Work Orders
Essential Strategies, Inc. – 5 – What Data Models Can’t DoCopyright © 1998, Essential Strategies, Inc
Multiple paths require care
Another area where data models cannotexpress all that must be said is probably themost treacherous. Any time it is possible tonavigate from one entity to another bymeans of two paths, it is possible to specifycombinations of occurrences that arenonsense. In each case, a business rule isrequired to specify what combinations arevalid.
For example, in Figure 5. a TEST
REQUIREMENT from a particular SAMPLE
METHOD is for a particular TEST TYPE.Meanwhile, each TEST must be conductedon a SAMPLE, which, in turn, must be drawnaccording to a particular SAMPLE METHOD.
There is nothing in the model, however, toassure that, given a SAMPLE METHOD used,the TESTS conducted are examples of theTEST TYPES that are the object of TEST
REQUIREMENTS from the same SAMPLE
METHOD. The SAMPLE METHOD may requireone set of TEST TYPES, but the actual TESTS
conducted may be completely different.
SAMPLE METHOD
thebasis
for
forfrom
theobject
of
. Medium
TESTREQUIREMENT
SAMPLE
drawnaccordingto
used todraw
TEST
conducted onan example of
embodiedin
subjectto
TEST TYPE
Figure 5: Test Requirements and Tests
Note that this raises an important point: Itmay be appropriate to bounce any dataentered that violated this condition. If in theworld, however, it sometimes happens that
the wrong tests are given, this model isexactly correct. There may be a businesspolicy not to, in which case, a system basedon this model should alert the community
Essential Strategies, Inc. – 6 – What Data Models Can’t DoCopyright © 1998, Essential Strategies, Inc
that the wrong things are happening, but itshould not necessarily prevent these wrongthings from being recorded.
Note the distinction between enforcingbusiness rules by constraining what isallowed to be entered, and enforcing them inthe world by notifying people when they arebeing violated in the world.
In some cases inconsistencies are allowed.For example, in Figure 6. It is possible for aPERSON to be in an EMPLOYMENT with oneORGANIZATION, while h’ POSITION
ASSIGNMENT may be to a POSITION definedby another ORGANIZATION. This isreasonable in the case where someone ishired in one department, but temporarilyassigned to another.
based on
PARTY
PERSON
ORGANIZATION
POSITIONASSIGNMENT
POSITION
of
in
filled by
to
defined by
responsible for
EMPLOYMENT
the basis for
with
the source of
Figure 6: Employment
In some cases, constraints can berepresented in a data model. Figure 7 is avariation on Figure 5. In this case each testmust be the carrying out of a testrequirement. This at least makes itimpossible to specify a test that is not
required by somebody. Unfortunately thisstill does not require the sample methodwhich is the basis for the test requirement tobe the same as the sample method whichwas used to draw the sample which wassubject to the test.
Essential Strategies, Inc. – 7 – What Data Models Can’t DoCopyright © 1998, Essential Strategies, Inc
SAMPLE METHOD
thebasis
for
forfrom
theobject
of
TESTREQUIREMENT
SAMPLE
drawnaccordingto
used todraw
TEST
conducted onthe carryingout of
embodiedin
subjectto
TEST TYPE
Figure 7: Test Requirements, again
Fundamentally, the data model cannotdescribe constraints among specificoccurrences of entities. It can only describeconstraints among the entity types.
No provision for derivations
A final shortcoming of data models, at least asthey are usually drawn, is lack of attention toderived data.
Relational theory does not provide forcomputed columns, and data modeling hasalways tended to reflect this. In fact,
however, using derived attributes is a verypowerful way to present the logic behindcomplex database-wide calculations.
For example, Figure 8 shows the model forTIME SHEET ENTRY, the vehicle for recordingthe amount of time spent by a PERSON oneither an ACTIVITY or a WORK ORDER. TheCASE*Method as formally defined (and theCASE tools which support it) does notprovide a formal way of documenting howlabor costs are derived from the informationin the model.
Essential Strategies, Inc. – 8 – What Data Models Can’t DoCopyright © 1998, Essential Strategies, Inc
authorizedby
the authorization
for
the resonsibilityof
responsible for
PARTY
preparer of
preparedby
TIME SHEET ENTRY
charged with
charged with. Charge rate
ACTIVITY
. Hours worked
charged to
charged to
PERSON ORGANI-ZATION
WORKORDER
submitter of
by
Figure 8: Labor Costs
BUSINESS RULES
What then, is a business rule? According tothe Business Rule GUIDE Project, it is a“definitional, structural, or conditionalconstraint upon the business.”
Barbara von Halle defines four kinds ofbusiness rules:4
ü Definition A business noun’s meaning orsignificance. These are usually documentedbehind a data model.
ü Fact (type) An assertion that a particularnoun has a property or plays a role, or that oneor more objects (nouns) participate together in arelationship. These relationships are what a datamodel is mostly about.
ü Constraint (type) A restriction of or thevalidation rule about a fact type’s population.
4 Barbara von Halle “Living by the
Rules”, Database Programming andDesign, October, 1994, withcontributions from G. M. Nijlssen and T.A. Halpin, Conceptual Schema andRelational Database Design,(Sydney:Prentice Hall, 1989)..
These are statements about the business in theform of “must be” or “must not be”. Staticconstraints describe what may be so in any stateof the database. Dynamic constraints restrict thepossible transitions between database states.These should be documented along with the datamodel and often are not. A way of describingthese on the data model drawing itself would bewelcome.
ü Derivation specification (type) a mechanismfor deriving new fact types from existing facttypes.
(“Type” may be used since we are talkinghere about the definition of a fact,constraint, etc., not each occurrence of it.)
DEFINITIONS
What do we mean by PARTY? CONTRACT?EXPENSE ALLOCATION? The definition of anentity will place constraints around whatmay or may not be a legal occurrence of thatentity. Typically, definitions are deliveredalong with a data model, to define theentities which appear in that model.
Essential Strategies, Inc. – 9 – What Data Models Can’t DoCopyright © 1998, Essential Strategies, Inc
FACTS
An entity model describes facts about abusiness. Specifically, a data model canrepresent the following kinds of facts:
ü Each occurrence of an entity may or must have“the property of”... That is, it may be describedby an attribute.
ü Each occurrence of an entity may or must be “akind of”... That is, it is a sub-type of encompassed in the meaning of anotherentity.
ü Each occurrence of an entity may or must “playthe role of”. That is, it is related in some way toanother term. Some relationships are particularlycommon, such as:
Each occurrence of an entity may ormust be “composed of” another enitity.
Each occurrence of an entity may ormust be “the embodiment of” another,type, entity.
There are, of course, many others.
CONSTRAINTS
Constraints, in general, are not handled wellby data models. While data models candescribe what “may be” true, they can onlydescribe a few kinds of things that “mustbe” or “must not be” true. This is true of allflavors of data modeling except NIAM,which does include an extensive facility fordescribing constraints.5
As described above, Ron Ross hasdeveloped a comprehensive approach todocumenting the constraints that would
5 The only good book in English
describing NIAM is Messrs Nijssen andHalpin, referenced above.
apply to a data model graphically, on thedata model itself.
He sees two basic kinds of rule elements:
ü One describes an integrity constraint. This issomething that must be true about an entityrelationship or attribute, by definition.
ü The other describes a condition. This may betrue or false. Depending on the condition otherconstraints may apply.
Each rule describes the effect of aconstraining object upon a constrainedobject. On the left side of Figure 9 is anintegrity constraint, represented by thearrowhead with the “X” in it. (This is amodification to Mr. Ross’s notation, toadapt it to the CASE*Method. In hisoriginal notation, attributes are shown incircles, outside the entity box.) An integrityconstraint must be true. The constraintshown says that any occurrence ofCUSTOMER must have a value for theattribute “address”. CUSTOMER is theconstrained object, and “address” is theconstraining object. The “X” in thesymbol is one of 42 rule types that Mr. Rossdescribes. It indicates the rule type“mandatory”. That is, it asserts that theconstrained object “must have” anoccurrence of the constraining object.
On the right side of Figure 9 is a condition.This says that if an occurrence of CUSTOMER
has a value for the attribute “address”, thencarry out the next part of the rule. Note thatrule elements can indeed be strung together.
Essential Strategies, Inc. – 10 – What Data Models Can’t DoCopyright © 1998, Essential Strategies, Inc
PARTY
. address
X
PARTY. address
X
Figure 9: A Business Rule
Each of the 42 rule types may be used as anintegrity constraint or a condition todescribe a particular situation. Mr. Rosscontends that this list of rule types is atomicand fundamental. While extensive, Mr.Ross does not claim that the list isexhaustive. There are undoubtedly more tobe discovered. The list is divided into ninecategories:
ü Instance verifiers, requiring an occurrence of anentity to be present at the creation of, during thelife of, etc. an occurrence of another entity.
ü Type verifiers, requiring occurrences of entitiesto be mutually exclusive, mutually dependent,etc.
ü Sequence verifiers, requiring occurrences to becreated in a particular order.
ü Position selectors, requiring reference to thelowest value of an attribute, or the highest value,or the earliest, etc.
ü Functional evaluation, requiring occurrences ofan entity to be unique, ascending, fluctuating,etc.
ü Comparative evaluators allow for comparisons(less than, equal, etc.) between the attributes oftwo occurrences of the same or different entities.
ü Calculators, are used in a rule to derive valuesused by the rule.
ü Update controllers determine whether a valuemust be forever fixed, updatable, etc.
ü Timing controllers control the timing of events.
These rule types can be applied, one at atime or in groups, to the data modelsdescribed above, thus adding the rulesnecessary to complete them:
Implied assumptions
The problem of ORGANIZATIONS beingcomposed of themselves is handled by theinverse of a “mandatory” rule. Thecomparative evaluator condition “EQ”(ual)establishes that under certain circumstancesthe “names” of two occurrences ofORGANIZATION will be equal. (The circlewith the slash negates the “EQ” rule,making it “must not be”.) Specifically,the mandatory integrity constraint, “X”requires this to be true. (Or rather, not true,given the negation qualifier.)
Essential Strategies, Inc. – 11 – What Data Models Can’t DoCopyright © 1998, Essential Strategies, Inc
ORGANIZATIONpart of
composed of
# NameXEQ
Figure 10: Recursion Assumption
The multi-level hierarchy is handled by a“restrictive” constraint. (See Figure 11.)This means that an occurrence of therelationship being constrained may onlyexist if specified conditions are met.Specifically, the numbered circles requirethat the other relationships be traversed insequence to arrive at the same entityoccurrences as those in the constrainedrelationship. In this case, the constraint saysthat the composed of relationship isconstrained by three navigation steps. If youwant to say that one occurrence ofORGANIZATION is part of another, first lookat which ORGANIZATION TYPE thisoccurrence is an example of. Second, lookto see which ORGANIZATION TYPE theORGANIZATION TYPE is part of. Third, lookto see theoccurrences of ORGANIZATION thatthis second ORGANIZATION TYPE isembodied in. The occurrence ofORGANIZATION on the part of side of thenew relationship has to be one of those.
ORGANIZATIONpart of
composed of
ORGANIZATIONTYPE
part of
composed of
an example of
embodiedin
R
1
32
Figure 11: Related Recursions
Essential Strategies, Inc. – 12 – What Data Models Can’t DoCopyright © 1998, Essential Strategies, Inc
Optional relationships
The ASSET / ASSET ACCOUNT example ofoptional relationships is addressed in Figure12. This rule makes use of the conditionsymbol (the one that looks like a bicycleseat). The particular rule type is “TI” whichmeans “elapsed time”. Its qualifiers are “L”which means set the time as a lower limit,and “1 wk”, which is the amount of timeinvolved. Specifically, this is a test to see ifat least one week has passed since thecreation of an occurrence of ASSET. Since itis a condition, if it tests for true, another ruleis applied in this case the “X” for“mandatory”. If at least one week is passed,an occurrence of accounted for must becreated. That is, each ASSET must be
accounted for by one and only one ASSET
ACCOUNT within one week of the creation ofthe ASSET.
Figure 13 shows our other example of anoptional relationship. In this case, it wasfirst necessary to expand the model to clarifythe terms required to establish the constraint.It turns out (in this imaginary world) that ifan ACTIVITY is not authorized by a WORK
ORDER, it must be authorized by a PERSON.That person may be the holder of aparticular POSITION. We want to constrainthe “authorized by one PERSON”relationship, so that it can only beestablished if the PERSON is holder of aPOSITION whose “name” is “supervisor”.
L
ASSET
. Actual unit cost
accounted for in
. Balance
ASSETACCOUNT
ACCOUNT
LIABILITYACCOUNT
EQUITY ACCOUNTDISCRETEITEM
INVENTORY an accounting
of
TI
X
1 wk
Figure 12: Asset Usage
Essential Strategies, Inc. – 13 – What Data Models Can’t DoCopyright © 1998, Essential Strategies, Inc
ACTIVITY
WORKORDER
carriedout via
PERSON
POSITION
. Name
authorized by
authorizorof
held by
holderof
authorized by
"supervisor"
X
EQ
Figure 13: Activities and Work Orders
SAMPLE METHOD
thebasis
for
forfrom
theobject
of
. Medium
TESTREQUIREMENT
SAMPLE
drawnaccordingto
used todraw
TEST
conducted onan example of
embodiedin
subjectto
TEST TYPE
R
Figure 14: Test Requirements and Tests
To do this, we say that an occurrence of therelationship can only be established if the“equal to” condition pointed to by the “X” istrue. That is, the condition asking whether
the “name” attribute of an occurrence ofPOSITION is “EQ”(ual to) the value“supervisor” must be “true”.
Essential Strategies, Inc. – 14 – What Data Models Can’t DoCopyright © 1998, Essential Strategies, Inc
That is, an ACTIVITY may be authorized bya PERSON only if that PERSON is holder ofthe POSITION of “supervisor”.
Multiple paths require care
The “restricted” rule type is specificallyintended to deal with multiple relationshipsbetween the same entities. In our firstexample (Figure 14), the establishment of aTEST conducted on a SAMPLE is restrictedto those SAMPLES whose SAMPLE METHOD isthe basis for a TEST REQUIREMENT for aTEST TYPE that is embodied in the TEST.
The “R” type integrity constraint constrainsthe relationship showing the SAMPLE that aTEST is conducted on with the relationshipspecifying the TEST TYPE that TEST is anexample of, and the relationship specifyingthat a TEST REQUIREMENT for that TEST TYPE
is from the SAMPLE METHOD used to drawthe SAMPLE.
As stated above, in some casesinconsistencies are allowed. In Figure 6there is no need to add any constraints, sinceit is permitted for the defined byrelationship to point to a differentORGANIZATION than that which theEMPLOYMENT is with.
Adding constraints in the data model itselfdoes not significantly affect the constraintsdescribed by this notation. In Figure 16 (avariation on Figure 14, where TEST
REQIREMENT has been inserted between TEST
and TEST TYPE), the “restricted” rule type canbe applied to the one relationship showingthat a TEST is the carrying out of a TEST
TYPE. As in Figure 14, this says that a TEST
may be carried out on a SAMPLE, only if itis the carrying out of a TEST REQUIREMENT
which is from the same SAMPLE METHOD thatthe SAMPLE is from.
based on
PARTY
PERSON
ORGANIZATION
POSITIONASSIGNMENT
POSITION
of
in
filled by
to
defined by
responsible for
EM LO ME T
he ba is fo
with
the source of
Figure 15: Employment
Essential Strategies, Inc. – 15 – What Data Models Can’t DoCopyright © 1998, Essential Strategies, Inc
SAMPLE METHOD
thebasis
for
forfrom
theobject
of
TESTREQUIREMENT
SAMPLE
drawnaccordingto
used todraw
TEST
conducted onthe carryingout of
embodiedin
subjectto
TEST TYPE
R
Figure 16: Test Requirements, again
Essential Strategies, Inc. – 16 – What Data Models Can’t DoCopyright © 1998, Essential Strategies, Inc
No provision for derivations
Mr. Ross’ notation provides forcalculated field values. There are threeconstraint types (“SUM”, “SUB”, and“MULT”) in his “Calculators” rulecategory. It is not necessary to usethese, however, to show derivations on adata model. Figure 17 shows howderived attributes can be represented onthe model itself. The calculations arenot shown on the drawing, (as they arewhen Mr. Ross’ notation is used), andunfortunately the CASE*Designerallows only derived attributes to bespecified as text.
In our example, since “charge rate” is anattribute of the PERSON who is the
submitter of the TIME SHEET ENTRY, it ispossible to define the “cost” of the timeworked as “hours worked” (in TIME
SHEET ENTRY), times “charge rate” (inPERSON, inherited from PARTY).
Once the “cost” has been calculated forone TIME SHEET ENTRY, it is a simplematter to sum that attribute across all theTIME SHEET ENTRIES charged to anACTIVITY, in order to determine the“labor cost” of the activity. The sum ofthe “labor costs” of all the ACTIVITIES
authorized by a WORK ORDER, plus thesum of the “costs” of all the TIME SHEET
ENTRIES directly charged to the WORK
ORDER yields the “total labor cost” of theWORK ORDER.
authorizedby
the authorization
for
the resonsibilityof
responsible for
PARTY
preparer of
preparedby
TIME SHEET ENTRY
submitter of
by
charged with
charged with. Charge rate
composed ofpart of
ACTIVITY. Hours worked. (Cost)
. (Labor cost)
charged to
charged to
PERSON ORGANI-ZATION
WORKORDER
. (Total labor cost)
Figure 17: Labor Cost
About the Author . . .
A twenty-five year veteran of theInformation Industry, Dave Hay hasbeen producing data models to supportstrategic information planning andrequirements planning for over eightyears. He has worked in a variety ofindustries, including, among others,power generation, clinicalpharmaceutical research, oil refining,forestry, and broadcast. He is Presidentof Essential Strategies, Inc., a consulting
firm dedicated to helping clients definecorporate information architecture,identify requirements, and planstrategies for the implementation of newsystems.
He is the author of the book from DorsetHouse Publishers, Data Model Patterns:Conventions of Thought. He may bereached at 713-464-8316 [email protected]. Hiscompany’s web page is atwww.essentialstrategies.com.