The Dagstuhl Middle Model: An Overview
description
Transcript of The Dagstuhl Middle Model: An Overview
The Dagstuhl Middle Model:An Overview
Timothy C. Lethbridge
SITE, University. of Ottawa
ateM 2003 - Victoria Timothy C. Lethbridge 2
MotivationInformation interchange is necessary among
reverse engineering tools• So researchers/companies can
Use each others’ tools Interpret each other’s data
Therefore we need one or more metamodels• (better if we can integrate and have just one metamodel)
We will use the term metamodel and schema interchangeably
ateM 2003 - Victoria Timothy C. Lethbridge 3
Various metamodels that have been used
• CDIF (obsolete now)• UML metamodel and the MOF• TA / RSF schemas
In the Canadian research community• Famix• Others
Typical problems• Too proprietary• Too complex• Don’t meet all reverse-engineering requirements
ateM 2003 - Victoria Timothy C. Lethbridge 4
Requirements not met by the UML metamodel
Effective handling of source-code level information such as:• Full procedure call hierarchy• Full details of variables and access information• Macros and other source code constructs
The UML metamodel focuses on things representable by UML• i.e. designed for forward-engineering
ateM 2003 - Victoria Timothy C. Lethbridge 5
Main types of metamodels for reverse engineering
Low-level (Abstract Syntax Graph)• Specific to a programming language• Can represent full details of code, allowing almost unlimited types of analysis
Mid-level• Relationships among program entities• More compact than low-level
Architectural• E.g. components, pipes and filters
Database
ateM 2003 - Victoria Timothy C. Lethbridge 6
The Dagstuhl Middle Model (DMM)Probably should from now on be called the
Dagstuhl Middle Metamodel
Initiated at the Dagstuhl Seminar on reverse engineering tool interoperability• Held Jan 22-26, 2001• Involved about 45 people• Ratified GXL 1.0
A syntactic representation for schemas• Started work on the four types of schemas
ateM 2003 - Victoria Timothy C. Lethbridge 7
DMM background continuedRepresents program and source elements and their
relationships• In a language-independent way• Suitable for typical OO and procedural languages
C, C++, Java, Fortran, Cobol, etc.• Extensions needed for languages with special features
E.g. Logic languages, scripting languages
Key initial contributors and inputs • University of Ottawa (TA++)• Sander Tichelaar; Berne (Famix)• Erhard Plödereder; Stuttgart (Bauhaus)• others
Web site: http://titan.cnds.unibe.ch:8080/Exchange/2
ateM 2003 - Victoria Timothy C. Lethbridge 8
Design issuesOn the next few slides we will look at some of the
issues considered when designing DMM
Most of these issues were resolved in the direction of providing• Simpler models• That capture adequate information• Enabling the reverse engineer to navigate objects and relationships in the system Once they have found information of interest they
may look at code or more detailed models
ateM 2003 - Victoria Timothy C. Lethbridge 9
Design issue 1Should you be able to regenerate the original
code given a middle model?
Decision: No• That would yield models of too great complexity for our target audience
• But this does not preclude a parser that generates a low level model, coupled to a tool that ‘lifts’ the data to a middle model
ateM 2003 - Victoria Timothy C. Lethbridge 10
Design issue 2Should a model represent original code or pre-
processed code?• Options:
Original code• More useful for browsing tools• What the reverse engineer really thinks about and needs to
modify• Can model pre-processor directives + macros
Pre-processed• Much easier to parse
Decision: Represent original code
ateM 2003 - Victoria Timothy C. Lethbridge 11
Design issue 3
How language-independent should the metamodel be?
Decision:• It should be able to abstractly represent all common relationships among elements• Even though in different languages …
… the elements may have different names … the elements may have subtly different
semantics
ateM 2003 - Victoria Timothy C. Lethbridge 12
Examples of language-independence-promoting abstractions
• The notion of type of a variable is preserved But different types of pointers and storage can be
abstracted out• Procedures, functions and methods can all be treated similarly• Classes, records and structures can be treated similarly• Etc.
ateM 2003 - Victoria Timothy C. Lethbridge 13
Design issue 4Should all references be resolved?
E.g. exactly what procedure is called by what procedure call
Resolved references• Not possible in the general case due to function pointers,
dynamic binding• Even in the non-OO case, requires full understanding of
language-dependent linkage rules• Often dependent on makefile
In practice, software engineers can understand most aspects of code without resolved references
ateM 2003 - Victoria Timothy C. Lethbridge 14
Design issue 5
What level of detail needs to be captured?• Assignments, control constructs?
No – that is for low-level ASG metamodels• Local variables?
Not normally at the middle level• Exact position of all references? (e.g. calls)
Existence of references in some object may be all that is needed
• Enough detail for full data flow analysis? No; only enough for limited DFA
ateM 2003 - Victoria Timothy C. Lethbridge 15
Design issue 6Do we separate representation of source entities from
representation of conceptual entities?
Source elements are chunks of source code• Needed for tools to point reverse engineers to where
conceptual entities are mapped to code• Also useful for representing source constructs such as
macros, conditional compilation, etc.
Such a separation has proved useful, but may not always be needed
ateM 2003 - Victoria Timothy C. Lethbridge 16
How to model DMM?
We follow the OMG convention of modeling DMM using a class diagram
Actually we use the subset of class diagrams called MOF (Meta Object Format)
ateM 2003 - Victoria Timothy C. Lethbridge 17
Top level of the hierarchy
ModelRelationship
SourceObject ModelObject
BehaviouralElement
ModelElement
StructuralElement
Relationship
*
SourceModelRelationship
SourceRelationship
*
ateM 2003 - Victoria Timothy C. Lethbridge 18
Modelelementhierarchy ModelRelationship {abstract}
Package
FormalParameter
position
ExecutableValue
SourceObject
{abstract}
ModelObject {abstract}
name
Routine
0..1 defines
BehaviouralElement {abstract}
Field
Variable
Value
ModelElement {abstract}
EnumeratedType
EnumerationLiteral
Type
StructuralElement {abstract}
Method
StructuredType
Class
1..*
*
0..1
0..1 declares*
visibility
isConstructor
isDestructor
isAbstract
isDynamicallyBound
isOverideable
inheritsFrom
**
isSubpackageOf
*
isOfType
hasValue
0..1isDefinedInTermsOf
CollectionType
size
*
0..1*
*
0..1
0..1
isSubclassable
isEnumerationLiteralOf
isFieldOf
isMethodOf
invokes
* *accesses
* *
isParameterOf
imports
*
*
0..1
*
ateM 2003 - Victoria Timothy C. Lethbridge 19
Relationshiphierarchy
Sets
IsPartOfSignatureOf
IsActualParameterOf
ModelElement
Invokes
IsFieldOf
Field
StructuredType
IsMethodOf
Method
Class
IsEnumerationLiteralOf
EnumerationLiteral
EnumeratedType
isPartOf Invokes
BehavioralElement
BehavioralElement
Accesses
BehaviouralElement
StructuralElement
IsOfType
StructuralElement
Type
Imports
IsReturnValueOf
Type
BehaviouralElement
IsParameterOf
FormalParameter
BehaviouralElement
TakesAddressOfComponent
TakesAddressOfObjectUsesObjectSetsObject
SetsComponent UsesComponent
TakesAddressOfUses
SourceRelationship
Class
LogicalPackage
ModelRelationship
Relationship
IsDefinedInTermsOf
Type
Type
Includes
ModelElement
SourceFile
Contains
SourceUnit
SourcePart
ModelElement
SourceObject
SourceObject
SourceFile
SourceModelRelationship
Defines Declares
SourceObject
ModelObject
IsExpansionOf
MacroDefinition
MacroExpansion
Describes
SourceObject
Comment
this class hierarchy
shows domain and
range of each
association class,
rather than link
attributes.
inheritsFrom
Class
Class
access
ateM 2003 - Victoria Timothy C. Lethbridge 20
Source objecthierarchy
SourceUnit
name
Resolvable
name
Definition
SourceFile
path
*
******
Declaration Reference
SourcePart
startLine
startChar
endLine
endChar
SourceObject
*
contains
includes
MacroDefinition
name
MacroArgument
name
ModelObject* *
MacroExpansion
isExpansionOf
*
Comment
*
describes
ateM 2003 - Victoria Timothy C. Lethbridge 21
Issues not handled by DMM that may concern reverse engineers
Multi-part references to members• E.g. in the expression a.b().c
How
Function pointers• Reverse engineers we have worked for really have wanted to understand what can call what via function pointers
Computed and aliased references
Templates, à la C++
ateM 2003 - Victoria Timothy C. Lethbridge 22
Syntactic carrier neutralityDMM can be represented using:• GXL: See http://www.gupro.de/GXL/• TA• XMI• RDF• Any other format capable of representing instances of classes and their associations
A tool can use whichever it likes, provided there are appropriate syntax translators
ateM 2003 - Victoria Timothy C. Lethbridge 23
DMM Development Status
General website• http://scgwiki.iam.unibe.ch:8080/Exchange/2.
University of Ottawa• Have C++ parser that generates DMM in GXL• URL: http://www.site.uottawa.ca:4322/dmm
Others are also experimenting with model
Future work• Many details to be decided based on initial experiences
ateM 2003 - Victoria Timothy C. Lethbridge 24
Topics for discussionHow useful is DMM as it stands? Is anything clearly
inadequate? What changes, variants or extensions are needed?
What other middle level metamodels are there and can we merge the ideas from these to create a common standard
Will this standard be adopted
What extensions might be useful?
Can DMM be reasonably integrated with other types of metamodels? (E.g. database, architectural, and ASG metamodels, with common classes at the intersection)
ateM 2003 - Victoria Timothy C. Lethbridge 25
(End)
ateM 2003 - Victoria Timothy C. Lethbridge 26
TheUMLMeta-model
Component
0..10..10..1
ownedElement
Comment
*
Link
Structure
1..*1..*1..*
EnumerationLiteral
ProgrammingLanguageTypeEnumerationPrimitive
******
1..*
*
Object
Parameter
defaultValuekind
InstanceNamespace
specialization*
parent
*child
GeneralizableElement
isRootisLeafisAbstract
Package
Model
SubsystemDataTypeInterfacedeploymmentLocation
* resident
*
AssociationClass
Class Node
owner
{ordered}
participant
*
specification
*
AssociationEnd
isNavigableaggregationmultiplicity
connection
2..*2..*2..*
Association
Method
body
Operation
isAbstract
0..1
qualifier *
Attribute
initialValue
{ordered}*
specification
******
* type
Generalization
discriminator
Feature
visibility
BehaviouralFeatureStructuralFeature
multiplicity
Classifier
ModelElement
name
Element
Relationship
*
*
type
importedElement
*
*
ateM 2003 - Victoria Timothy C. Lethbridge 27
A legacymid-levelmodel:TA++
* resolvedTo 0..1
potentiallyImports*
*
ImportExistence
InclusionExistence
0..1
typ0..10..10..1 type0..10..10..1
Resolvable
Declaration
TypeReference
Reference
1..*0..1
*
SourceFile
path
* ******
SourceElementName
string
StructuredTypeDef
DatumDef
EnumeratedTypeDef
DataUseExistence
Definition
SourcePart
startCharendChar
SourcePackage
*
SpecificSourceElement
***
0..10..1 * RoutineDef
EnumerationLiteralDef
FieldDef
SourceUnit
*
potentiallyIncludes *
*
*
potentiallyUsesType
*potentiallyCalls
* RoutineCallExistence
TypeUseExistence
TypeDef
ClassDef
ReferenceExistence
******
contains
...
**
potentiallyRefersTo
parameters
methods
makesReferenceTo
*
*
ateM 2003 - Victoria Timothy C. Lethbridge 28
Another input model: part of Famix
Access
InheritanceDefinition
FunctionMethodPackageClassInvocation
Object
Association Model Entity BehavioralEntity