INFORMATION THEORETIC METRICS FOR SOFTWARE ARCHITECTURES Monday, March 30, 2001 Ali Mili West...
-
Upload
joshua-watts -
Category
Documents
-
view
215 -
download
2
Transcript of INFORMATION THEORETIC METRICS FOR SOFTWARE ARCHITECTURES Monday, March 30, 2001 Ali Mili West...
INFORMATION THEORETIC METRICS FOR SOFTWARE ARCHITECTURES
Monday, March 30, 2001
Ali Mili
West Virginia University
ACKNOWLEDGEMENTSFunded by NSF, under ITR program, for
2000-2003.Collaboration with Dr H. Ammar (WVU),
Dr M. Shereshevsky (WVU) and Dr Lionel Briand (Carleton U, Canada).
Co-funded by NASA IV&V, Fairmont, WV, for 2000-2001 (HCS).
SOFTWARE ARCHITECTURES: A KEY PARADIGM
Codifying Best Practices into recognizable abstractions.
Supporting various forms of Software Reuse (PLE, CBSE, COTS).
Architecture: Captures scope of reusable assets and inter-component protocols.
Quantifying Architectural Attributes Intrinsic Attributes: The architecture
as an artifact.Extrinsic Attributes: The architecture
as a blueprint.
PREMISES OF THE APPROACHA Three Tier Quality Model.A Three Dimensional Hierarchy of
Metrics.A Three Step Quantification Procedure.A Three Pronged Analysis
Methodology.
Three Tier Quality Model
Distinguishing between what we want to measure and what we can measure.
Qualitative Attributes, arbitrarily vague, arbitrarily (non)quantifiable.
Quantitative Factors, formally defined, arbitrarily difficult to compute, apprehend QA.
Computable Factors, easily computable, related to QF.
Three Dimensional Hierarchy of MetricsData vs. Control. Data flow, Control
flow between and within components.Static vs. Dynamic. Communication
vocabulary vs. Communication language.
Coupling vs. Cohesion. Flow between vs. within components.
Three Step Quantification ProcedureArchitectures to a Canonical
Representation. Predefined architectural style in Rapide.
Canonical Representation to Random Variables. Information flow.
Random Variables to Metrics. Information Theory Functions (known properties, known interpretations).
Three Pronged Research MethodAnalytical Methods. Elucidating cause
effect relationships.Empirical Methods. Eliciting laws from
empirical observations.Experimental Methods. Validate
relationships or laws against experimental data.
MODELING DECISIONSStandardizing mapping from coupling to
cohesion (cohesion as self coupling).Standardizing mapping from Static to
Dynamic (dynamic is language defined by static vocabulary).
Standardizing mapping from Random variable to Metric (Shannon’s entropy).
Modeling Decisions, IIData vs. Control: Different ranges;
possibly different correlations.Dynamic vs. Static: Static is easier to
compute, more reliable, but misses relevant aspects.
Cohesion as Self Coupling: Gives meaning to comparison (re: diagonality).
A QUALITY MODEL
A Canonical Architecture, Focal Point. Predefined architectural style:
Independent Components. Predefined Notation: A subset of Rapide.
Distinction Between Qualitative Attributes: relevant to architect,
evade quantification. Quantitative Factors: easy to define,
evade derivation/ estimation.
Qualitative Attributes Intrinsic Attributes: Conceptual
integrity; Completeness and Correctness; Feasibility.
Extrinsic Attributes: Run-time Properties (performance, availability, security, usability, functionality); Product Properties (testability, integrability, modifiability, portability, reusability).
Quantitative Factors: Error PropagationDefinition:
EP(A,B) = P([B](x)[B](x’) | xx’).
Reflects the probability that an error generated by A (feeding into B) is propagated by B (vs. masked).
Relevance: fault tolerance.
Quantitative Factors:Change PropagationDefinition:
CP(A,B) = P([B][B’] |
[A][A’] [S]=[S’])
Probability that a change in A mandates a change in B.
Relevance: Perfective Maintenance.
Quantitative Factors:Requirements PropagationDefinition:
RP(A,B) = P([B][B’] |
[A][A’] [S][S’]).
Probability that a change in A due to a requirements shift yields a change in B.
Relevance: Adaptive Maintenance.
Quantitative Factors: Design PropagationDefinition:
DP(A,B) = P(BB’ |
AA’ [S]=[S’])
Probability that a function preserving change in B causes a change in A.
Relevance: Corrective Maintenance.
ARCHITECTURAL METRICSA Hierarchy of Eight Metrics:
Data and Control Static and Dynamic Coupling and Cohesion.
Four Matrices.Validation will select; most likely
combine.
Rationale: Static vs. Dynamic
Static Metrics: Entropy of the vocabulary of information flow within/ between components.
Dynamic Metrics: Entropy of the language generated from that vocabulary during a typical execution.
Rationale: Data vs. ControlData Interchange:
carried by messages, parameters, shared data, etc.
Usually high bandwidth.
Control Interchange: carried by method calls, synchronization signals, event notifications.
Usually low bandwidth.
Elements of Information TheoryRandom variable X, over set E, probability
distribution P. Abbrev: P(X=e) = p(e). Shannon’s Entropy H(X) = - p(e) log(p(e)). Renyi’s Entropy
N(X) = (1/(1-)) p(e)Other interesting functions: conditional
entropies, joint entropies, etc.
Static Data MetricsStatic Data Coupling:
SDR(A,B): Random variable that represents the vocabulary of data transfer from A to B.
SDC(A,B): H(SDR(A,B)).
Static Data Metrics, IISDR(A,B): an integer over 32 bits,
uniformly used SDC(A,B)=32 bits.SDR(A,B): three independent integer
variables SDC(A,B)=96 bits.SDR(A,B): an integer representing a
Boolean (a la C) SDC(A,B)=1 bit.SDR(A,B): an array index 0..7, uniform
usage SDC(A,B) = 3 bits.
Static Data Metrics, IIIStatic Data Cohesion:
SDR(A): shorthand for SDR(A,A).
Implicitly, SDR(A,A): data transferred from A to A: state space of A.
Static Data Coupling, Cohesion: a Static Data NxN Matrix. N: # of components.
Static Control MetricsStatic Control Coupling:
SCR(A,B): Random variable that represents the vocabulary of control transfer from A to B.
SCC(A,B): H(SCR(A,B)).
Static Control Metrics, IISCR(A,B): A may call 8 methods of B,
with equal likelihoodSCC(A,B)=3bits.SCR(A,B): A may call 2 methods of B,
with equal likelihoodSCC(A,B)= 1bit.SCR(A,B): A may call 1 method of B
SCC(A,B) = 0 bits. Dynamic control metrics will distinguish from 0 methods.
Static Control Metrics, IIIStatic Control Cohesion:
SCR(A): shorthand for SCR(A,A).
Implicitly: control flow within A: evades precise generic definition.
Static Control Coupling, Cohesion: Static Control Matrix.
Dynamic MetricsStatic Random Variable: SR.Dynamic Random Variable:
DR = plausible sequences on SR.
DDR: plausible call/control sequences.
DCR: plausible data/parameter sequences.
Dynamic Metrics, II Normalizing Dynamic Metrics: If a sample
execution produces a call sequence of 1000 method names, is it because
traffic between A and B is intense,
or the data sample is large?
Reflect the 1st dimension, normalize the 2nd.
Dynamic Metrics, IIINormalizing for the Size of Data: Let
Ln be the sequence generated by a datum of size n. Rather than compute H(Ln), we compute
limn (H(Ln+1)-H(Ln)).
Whether this limit exists? Investigation.
limn (1/n) H(Ln).
Dynamic Metrics, IVMeasuring the Size of Data: A
Generic Procedure.
- Well founded ordering on data space,
- Transitive root,
- Stratify data space,
- Size of a datum: ordinal of its stratum.
Dynamic Metrics, VMetrics Dependent on Choice of
Ordering? Condition of Convergence Weeds Out Poor Choices of Ordering.
Binary Trees: Height, vs. Number of Nodes.
With number of nodes, limits are defined. Sequence increment: traffic generated by an extra node.
Dynamic Metrics, VIReflecting Dynamic Behavior: If A
calls a single method in B, static control coupling is 0 bits. Dynamic control coupling is the entropy of the random variable that represents the length of the (unitary) call sequence: a meaningful non-trivial value.
Measures of Integrity Ideal Matrix: High diagonal values; low
values outside diagonal. Absolute Diagonality: Distance to the
subspace of diagonal matrices. Relative Diagonality: Sine of the Angle
between the matrix and the subspace of diagonal matrices.
Captures modularity of the architecture by single scalars 0 .. 1. Mixed blessing.
DEPLOYMENT PLAN: Architectures to RapideUML as An Architectural
Representation: Rules for extracting architectural information.
Five Architectural StylesStyle: Topology + Msg Data Types + Protocols. Independent Components: event based
systems; communicating processes. Virtual Machines: Interpreter based. Example:
Rule Based Systems. DataFlow Architectures: Data triggers nodes.
Examples: Batch; Pipe and Filter. Data Centered Systems: Data Bases; Blackboard
Systems. Call/ Return Architectures: Main/ Sub; RPC; OO
Systems; Layered Systems.
Rapide Paradigms/ Constructs Object Oriented Executable ADL. Specifying and Prototyping Systems. Collection of Interfaces, connections between
interfaces, and formal constraints. Three types of connections: pipeline (),
agent(), and identification (to). Execution model is event-based and supports
concurrency of node executions.
DEPLOYMENT PLAN:Rapide to Random Variables The most difficult/ contentious/ controversial
issues. Mapping a Rapide Architectural description
into an NxN matrix of random variables. Relies on information that is for the most part
available at the architectural level: Data/ Control Flow within and between nodes, with relevant probability distributions.
Eliciting Interchange InformationData Flow within nodes: State
Variables.Data Flow between nodes: Message
Passing, Parameters, Shared Data.Control Flow between nodes:
Exchange of method calls; event flow.Control Flow within nodes: Debate.
Eliciting Probability DistributionsSpecified Usage Probabilities. Inferred Usage Probabilities (e.g Stack).Simulated Usage Probabilities.Default Usage Probabilities (uniform
over data type, over know subrange).
DEPLOYMENT PLAN: Random Variables to MetricsStraightforward: Applying the Entropy
function.Subject to Validation: Shannon vs.
Renyi. Perhaps other forms.Selection of metrics formulas dependent
on validation step. Anticipated: logical/ numeric/ probabilistic relationships between CM and QF.
AUTOMATION PLAN: Rapide to Random Variables Syntax Directed Translation (Yacc-like) of
Rapide declarations into Ensemble definitions.
Bare Rapide Parser, progressively extended. Investigation: Probabilistic Annotations of
Rapide, using closed (wrt aggregate declarations) Prob. Distribution vocabulary.
Outcome: A square matrix of random variables.
AUTOMATION PLAN: Random Variables to MetricsDeriving Matrix of metrics from Matrix of
random variables, using Shannon/ Renyi.
Assessing Diagonality, other properties.Assessing/ Correlating/ Providing
Bounds for Quantitative Factors.Apprehending/ Providing Ratings for
Qualitative Attributes.
VALIDATION PLAN:Analytical ValidationValidating Computable Metrics with
respect to Qualitative Factors: Documented approximations.
Under Weak Hypothesis, found equality between EP(A,B) and Renyi entropy of SDR(A,B).
VALIDATION PLAN:Empirical ValidationCase Study: HCS (Hub Control
Software, ISS); UML descriptions.Map to Rapide, Compute Metrics.Correlate with measurable propagation
probabilities, in light of system logs.Other examples: a Client Server, a
Pacemaker, a KWIC index.
CONCLUSION AND PROSPECTSA Three-Tier Quality Model.A Three Dimensional Hierarchy of
Metrics.A Three-Step Quantification Procedure.A Three-Pronged Methodology. Preliminary Work; tentative/
speculative.Looks easier (nicer?) than it is.