INFORMATION THEORETIC METRICS FOR SOFTWARE ARCHITECTURES Monday, March 30, 2001 Ali Mili West...

47
INFORMATION THEORETIC METRICS FOR SOFTWARE ARCHITECTURES Monday, March 30, 2001 Ali Mili West Virginia University

Transcript of INFORMATION THEORETIC METRICS FOR SOFTWARE ARCHITECTURES Monday, March 30, 2001 Ali Mili West...

INFORMATION THEORETIC METRICS FOR SOFTWARE ARCHITECTURES

Monday, March 30, 2001

Ali Mili

West Virginia University

ACKNOWLEDGEMENTSFunded by NSF, under ITR program, for

2000-2003.Collaboration with Dr H. Ammar (WVU),

Dr M. Shereshevsky (WVU) and Dr Lionel Briand (Carleton U, Canada).

Co-funded by NASA IV&V, Fairmont, WV, for 2000-2001 (HCS).

SOFTWARE ARCHITECTURES: A KEY PARADIGM

Codifying Best Practices into recognizable abstractions.

Supporting various forms of Software Reuse (PLE, CBSE, COTS).

Architecture: Captures scope of reusable assets and inter-component protocols.

Quantifying Architectural Attributes Intrinsic Attributes: The architecture

as an artifact.Extrinsic Attributes: The architecture

as a blueprint.

PREMISES OF THE APPROACHA Three Tier Quality Model.A Three Dimensional Hierarchy of

Metrics.A Three Step Quantification Procedure.A Three Pronged Analysis

Methodology.

Three Tier Quality Model

Distinguishing between what we want to measure and what we can measure.

Qualitative Attributes, arbitrarily vague, arbitrarily (non)quantifiable.

Quantitative Factors, formally defined, arbitrarily difficult to compute, apprehend QA.

Computable Factors, easily computable, related to QF.

Three Dimensional Hierarchy of MetricsData vs. Control. Data flow, Control

flow between and within components.Static vs. Dynamic. Communication

vocabulary vs. Communication language.

Coupling vs. Cohesion. Flow between vs. within components.

Three Step Quantification ProcedureArchitectures to a Canonical

Representation. Predefined architectural style in Rapide.

Canonical Representation to Random Variables. Information flow.

Random Variables to Metrics. Information Theory Functions (known properties, known interpretations).

Three Pronged Research MethodAnalytical Methods. Elucidating cause

effect relationships.Empirical Methods. Eliciting laws from

empirical observations.Experimental Methods. Validate

relationships or laws against experimental data.

MODELING DECISIONSStandardizing mapping from coupling to

cohesion (cohesion as self coupling).Standardizing mapping from Static to

Dynamic (dynamic is language defined by static vocabulary).

Standardizing mapping from Random variable to Metric (Shannon’s entropy).

Modeling Decisions, IIData vs. Control: Different ranges;

possibly different correlations.Dynamic vs. Static: Static is easier to

compute, more reliable, but misses relevant aspects.

Cohesion as Self Coupling: Gives meaning to comparison (re: diagonality).

A QUALITY MODEL

A Canonical Architecture, Focal Point. Predefined architectural style:

Independent Components. Predefined Notation: A subset of Rapide.

Distinction Between Qualitative Attributes: relevant to architect,

evade quantification. Quantitative Factors: easy to define,

evade derivation/ estimation.

Qualitative Attributes Intrinsic Attributes: Conceptual

integrity; Completeness and Correctness; Feasibility.

Extrinsic Attributes: Run-time Properties (performance, availability, security, usability, functionality); Product Properties (testability, integrability, modifiability, portability, reusability).

Quantitative Factors: Error PropagationDefinition:

EP(A,B) = P([B](x)[B](x’) | xx’).

Reflects the probability that an error generated by A (feeding into B) is propagated by B (vs. masked).

Relevance: fault tolerance.

Quantitative Factors:Change PropagationDefinition:

CP(A,B) = P([B][B’] |

[A][A’] [S]=[S’])

Probability that a change in A mandates a change in B.

Relevance: Perfective Maintenance.

Quantitative Factors:Requirements PropagationDefinition:

RP(A,B) = P([B][B’] |

[A][A’] [S][S’]).

Probability that a change in A due to a requirements shift yields a change in B.

Relevance: Adaptive Maintenance.

Quantitative Factors: Design PropagationDefinition:

DP(A,B) = P(BB’ |

AA’ [S]=[S’])

Probability that a function preserving change in B causes a change in A.

Relevance: Corrective Maintenance.

ARCHITECTURAL METRICSA Hierarchy of Eight Metrics:

Data and Control Static and Dynamic Coupling and Cohesion.

Four Matrices.Validation will select; most likely

combine.

Rationale: Static vs. Dynamic

Static Metrics: Entropy of the vocabulary of information flow within/ between components.

Dynamic Metrics: Entropy of the language generated from that vocabulary during a typical execution.

Rationale: Data vs. ControlData Interchange:

carried by messages, parameters, shared data, etc.

Usually high bandwidth.

Control Interchange: carried by method calls, synchronization signals, event notifications.

Usually low bandwidth.

Elements of Information TheoryRandom variable X, over set E, probability

distribution P. Abbrev: P(X=e) = p(e). Shannon’s Entropy H(X) = - p(e) log(p(e)). Renyi’s Entropy

N(X) = (1/(1-)) p(e)Other interesting functions: conditional

entropies, joint entropies, etc.

Static Data MetricsStatic Data Coupling:

SDR(A,B): Random variable that represents the vocabulary of data transfer from A to B.

SDC(A,B): H(SDR(A,B)).

Static Data Metrics, IISDR(A,B): an integer over 32 bits,

uniformly used SDC(A,B)=32 bits.SDR(A,B): three independent integer

variables SDC(A,B)=96 bits.SDR(A,B): an integer representing a

Boolean (a la C) SDC(A,B)=1 bit.SDR(A,B): an array index 0..7, uniform

usage SDC(A,B) = 3 bits.

Static Data Metrics, IIIStatic Data Cohesion:

SDR(A): shorthand for SDR(A,A).

Implicitly, SDR(A,A): data transferred from A to A: state space of A.

Static Data Coupling, Cohesion: a Static Data NxN Matrix. N: # of components.

Static Control MetricsStatic Control Coupling:

SCR(A,B): Random variable that represents the vocabulary of control transfer from A to B.

SCC(A,B): H(SCR(A,B)).

Static Control Metrics, IISCR(A,B): A may call 8 methods of B,

with equal likelihoodSCC(A,B)=3bits.SCR(A,B): A may call 2 methods of B,

with equal likelihoodSCC(A,B)= 1bit.SCR(A,B): A may call 1 method of B

SCC(A,B) = 0 bits. Dynamic control metrics will distinguish from 0 methods.

Static Control Metrics, IIIStatic Control Cohesion:

SCR(A): shorthand for SCR(A,A).

Implicitly: control flow within A: evades precise generic definition.

Static Control Coupling, Cohesion: Static Control Matrix.

Dynamic MetricsStatic Random Variable: SR.Dynamic Random Variable:

DR = plausible sequences on SR.

DDR: plausible call/control sequences.

DCR: plausible data/parameter sequences.

Dynamic Metrics, II Normalizing Dynamic Metrics: If a sample

execution produces a call sequence of 1000 method names, is it because

traffic between A and B is intense,

or the data sample is large?

Reflect the 1st dimension, normalize the 2nd.

Dynamic Metrics, IIINormalizing for the Size of Data: Let

Ln be the sequence generated by a datum of size n. Rather than compute H(Ln), we compute

limn (H(Ln+1)-H(Ln)).

Whether this limit exists? Investigation.

limn (1/n) H(Ln).

Dynamic Metrics, IVMeasuring the Size of Data: A

Generic Procedure.

- Well founded ordering on data space,

- Transitive root,

- Stratify data space,

- Size of a datum: ordinal of its stratum.

Dynamic Metrics, VMetrics Dependent on Choice of

Ordering? Condition of Convergence Weeds Out Poor Choices of Ordering.

Binary Trees: Height, vs. Number of Nodes.

With number of nodes, limits are defined. Sequence increment: traffic generated by an extra node.

Dynamic Metrics, VIReflecting Dynamic Behavior: If A

calls a single method in B, static control coupling is 0 bits. Dynamic control coupling is the entropy of the random variable that represents the length of the (unitary) call sequence: a meaningful non-trivial value.

Measures of Integrity Ideal Matrix: High diagonal values; low

values outside diagonal. Absolute Diagonality: Distance to the

subspace of diagonal matrices. Relative Diagonality: Sine of the Angle

between the matrix and the subspace of diagonal matrices.

Captures modularity of the architecture by single scalars 0 .. 1. Mixed blessing.

DEPLOYMENT PLAN: Architectures to RapideUML as An Architectural

Representation: Rules for extracting architectural information.

Five Architectural StylesStyle: Topology + Msg Data Types + Protocols. Independent Components: event based

systems; communicating processes. Virtual Machines: Interpreter based. Example:

Rule Based Systems. DataFlow Architectures: Data triggers nodes.

Examples: Batch; Pipe and Filter. Data Centered Systems: Data Bases; Blackboard

Systems. Call/ Return Architectures: Main/ Sub; RPC; OO

Systems; Layered Systems.

Rapide Paradigms/ Constructs Object Oriented Executable ADL. Specifying and Prototyping Systems. Collection of Interfaces, connections between

interfaces, and formal constraints. Three types of connections: pipeline (),

agent(), and identification (to). Execution model is event-based and supports

concurrency of node executions.

DEPLOYMENT PLAN:Rapide to Random Variables The most difficult/ contentious/ controversial

issues. Mapping a Rapide Architectural description

into an NxN matrix of random variables. Relies on information that is for the most part

available at the architectural level: Data/ Control Flow within and between nodes, with relevant probability distributions.

Eliciting Interchange InformationData Flow within nodes: State

Variables.Data Flow between nodes: Message

Passing, Parameters, Shared Data.Control Flow between nodes:

Exchange of method calls; event flow.Control Flow within nodes: Debate.

Eliciting Probability DistributionsSpecified Usage Probabilities. Inferred Usage Probabilities (e.g Stack).Simulated Usage Probabilities.Default Usage Probabilities (uniform

over data type, over know subrange).

DEPLOYMENT PLAN: Random Variables to MetricsStraightforward: Applying the Entropy

function.Subject to Validation: Shannon vs.

Renyi. Perhaps other forms.Selection of metrics formulas dependent

on validation step. Anticipated: logical/ numeric/ probabilistic relationships between CM and QF.

AUTOMATION PLAN: Rapide to Random Variables Syntax Directed Translation (Yacc-like) of

Rapide declarations into Ensemble definitions.

Bare Rapide Parser, progressively extended. Investigation: Probabilistic Annotations of

Rapide, using closed (wrt aggregate declarations) Prob. Distribution vocabulary.

Outcome: A square matrix of random variables.

AUTOMATION PLAN: Random Variables to MetricsDeriving Matrix of metrics from Matrix of

random variables, using Shannon/ Renyi.

Assessing Diagonality, other properties.Assessing/ Correlating/ Providing

Bounds for Quantitative Factors.Apprehending/ Providing Ratings for

Qualitative Attributes.

VALIDATION PLAN:Analytical ValidationValidating Computable Metrics with

respect to Qualitative Factors: Documented approximations.

Under Weak Hypothesis, found equality between EP(A,B) and Renyi entropy of SDR(A,B).

VALIDATION PLAN:Empirical ValidationCase Study: HCS (Hub Control

Software, ISS); UML descriptions.Map to Rapide, Compute Metrics.Correlate with measurable propagation

probabilities, in light of system logs.Other examples: a Client Server, a

Pacemaker, a KWIC index.

CONCLUSION AND PROSPECTSA Three-Tier Quality Model.A Three Dimensional Hierarchy of

Metrics.A Three-Step Quantification Procedure.A Three-Pronged Methodology. Preliminary Work; tentative/

speculative.Looks easier (nicer?) than it is.

Questions?…

Une Science a l’Age de Ses Instruments de Mesure.

Louis Pasteur. One of the fundamental

aims of Science has been and continues to be that of progressing from perceptions to measurements.

Lotfi A. Zadeh.