Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

22
Improving Quality Through Increased Fidelity Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University

Transcript of Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

Page 1: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

Improving Quality Through Increased Fidelity

Scott N. WoodfieldDavid W. EmbleyStephen W. LiddleBrigham Young University

Page 2: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

ContextSoftware analysis/specification

Conceptualization of information in the social domainOften associated with people and the information

about peopleIncomplete, Inconsistent, Inaccurate

Examples: Medicine, law, and, historyNeed models of quality

Page 3: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

The Problem: Fidelity

Page 4: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

Our 7- Layer OrganizationBased on theoretical foundation suggested by Charles

T. Meadow entitled Text Information Retrieval SystemsLayers:

1. Symbol layer2. Class layer3. Information layer

4. Knowledge layer5. Evidence layer6. Communication layer7. Action layer

Page 5: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

Knowledge LayerProblem 1: Valid Models

A model is defined in terms of 1st order predicate calculus

A valid model is a populated model that is logically true

Valid models cannot easily represent information in the social domainIncomplete informationInconsistent informationInaccurate information

Page 6: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

Example: Child has Mother

All constraints are participation constraints.

Child Mother1 1:*

has

Child Mother1:* 1:*

has

The Over-Relaxation ProblemCannot detect and notify

Page 7: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

Allow Invalid InformationBut Detect and Notify

Child Mother1 1:*

has

• Allow insertion, modification, or deletion of invalid information• Detect constraint violations

• On insert• On modification• On deletion• On query

• Notify interested parties on constraint violation

Page 8: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

Knowledge Layer:Problem 2 : Hard Constraints

Child Mother1 1:*

has

Child Mother1 1:69

has

Page 9: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

Provide Soft Constraints

Child Mother1

has

Mean: 2Std. Dev.: 1Likelihood Cutoff: 16Validity Cutoff: 69

Page 10: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

New Model of ConstraintsHard

Constraint

SoftConstrai

nt

Constraint

Function

GeneralConstrai

nt

Cardinality

Constraint

Co-OccurrenceConstraint

Object-SetCardinalityConstraint

ParticipationConstraint

General Constraints: 1. The domain of a Constraint is unrestricted 2. The co-domain of a Constraint = [0,1] 3. The domain of a Cardinality Constraint is a

non-negative integer 4. The co-domain of a Hard Constraint = {0,1}

Page 11: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

Constraints as Implications

Child Mother1

has

Child(c) has Mother(m) Child c has one and only one Mother

All constraints may be viewed as implicationsFor cardinality constraints the antecedent

can be derived

Page 12: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

Evidence LayerProblem: How do you know an

assertion is true?

Associate Evidence With Relations

Child Mother1

has

Mean: 2; Std. Dev.: 1;Likelihood Cutoff: 16; Validity Cutoff: 69

Evidence Probability: 80% Source: Mother’s oral declaration Date recorded: 4 Dec, 1931

Page 13: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

EvidenceInformal vs. Formal

Formal evidence requires formal model

Facts vs. derived informationDerived information requires logic

Confidence informationProbability or confidence intervals

Page 14: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

Communication LayerInformation without communication is

worthlessWe wish to focus on describing the

problemsUnderstanding the problem is half the

battleConceptual modeling point of viewWe don’t have implemented solutions

yet

Page 15: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

Communication Layer Problem:Understanding the Many ModelsThe unified model assumption

Hinders sharingModels

Source modelTransport modelDestination model

Page 16: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

Communication Layer ProblemNon-Isomorphic ModelsProblem 1: Focusing on transport

representations doesn’t solve the real problem

Problem 2: The source and destination models may be non-isomorphic

Problem 3: The hardest problem is in the differences between meta-models for the source, transport, and destination models

Page 17: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

Example of Communication Layer ProblemWe are working on automatic information

extraction from semi-structured textOntology directed

Class syntax definitionsLinguistically enabled

Class differentiators Relation extractors

Page 18: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

Extraction Example1555. Elias Mather, b. 1750, d. 1788, son of Deborah Ely and Rich- ard Mather; m. 1771, Lucinda Lee, who was b. 1752, dau. of Abner Lee and EHzabeth Lee. Their children : —1. Andrew, b. 1772.2. Clarissa, b. 1774.3. Elias, b. 1776.4. William Lee, b. 1779, d. 1802.5. Sylvester, b. 1782.6. Nathaniel Griswold, b. 1784, d. 1785.7. Charles, b. 1787.

Page 19: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

ProblemsSource model is OSMXDestination model is GEDCOM XTransport model is unknownChallenges

OSMX allows normal generalization/specialization, GEDCOM X is more restrictive

OSMX allows arbitrary n-ary relations, GEDCOM X is focused on people, relations between people, and events

Page 20: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

Action LayerBehavior modeling

Too complex to discuss hereSemi-automatic fidelity enhancement

What do we do when constraints are violated?Human intervention – common assumptionMachine intervention

Automatic generation of rules based on the antecedent of constraints

Use of Modus Tollens Consider every simple predicate as a violated constraint

Machine processing if possible Human intervention if necessary

Page 21: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

Fidelity Enhancement Example

Child(c) has BloodType(bc) Child(c) has Mother(m) Child(c) has Father(f) Mother(m) has BloodType(bm) Father(f) has BloodType(bf)

ProbabilityOfChildsBloodType(bc, bm, bf) > 0.0

By modus tollens the constraint:

ProbabilityOfChildsBloodType(bc, bm, bf) > 0.0 Child(c) has BloodType(bc) Child(c) has Mother(m) Child(c) has Father(f) Mother(m) has BloodType(bm) Father(f) has BloodType(bf)

Becomes:

Page 22: Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.

Suggestions for Higher-Quality Social ModelsAllow invalid modelsAdd soft constraints – distribution-based

constraintsAdd evidence structured using conceptual

modelsLook at communication problems from the

perspective of conceptual modelsUse recorded information to semi-automatically

improve the model’s quality