Post on 01-Jan-2016
IBM Software Group | Rational software
© 2004 IBM Corporation2
Where are we (IBM) coming from?
We’re part of the Model-Driven Development (MDD) team in the Rational Software Division of the IBM Software Group
CDT is the core C++ component of Rational Software Architect
The C++ component includes visualization of code as UML as well as transformation of UML to C++
We need an accurate, complete DOM that allows programmatic code change
Since do both Java and C++ we’d like the architectures of the JDT and CDT to be “similar”.
IBM Software Group | Rational software
© 2004 IBM Corporation3
What have we done until now?
Started work on parser in Dec. 2002
Considered a number of options but settled on a handwritten parser Needed smart control over ambiguities in C/C++
Needed to support a number of clients with different needs
Similar in approach to newer versions of gcc.
First contributed in 1.1 for CModel replacing previous JavaCC-based parser
Added Indexing and Search as client in 1.2 Replaced previous ctags-based index (not without issue)
Numerous clients in 2.0, including content assist, F3, type browsers
IBM Software Group | Rational software
© 2004 IBM Corporation4
Parsing Architecture 1.0
We had concern over scalability of AST Settled on a callback-based architecture
Clients created only the data that they needed
Client would pass in a requestor to the parser and the parser would pass in parse info as the parse progressed Despite the names, the IAST* parse info does not form a proper AST
Client would also be responsible for passing in ScannerInfo Compiler command line arguments that affect parsing
• Include paths, macro defs Needed to properly parse
IBM Software Group | Rational software
© 2004 IBM Corporation5
Parsing Architecture 1.0 – What worked well
Pretty accurate parse information (usually) Content assist worked really well for C and C++
Accurate out line view
Accurate search results, F3
Ability to generate type info for class and hierarchy browsers
Flexibility for clients Enabled us to pile on the clients in 2.0
IBM Software Group | Rational software
© 2004 IBM Corporation6
Parser Architecture 1.0 – What didn’t work well
Performance Assumed parse times would be quick, < 1 sec
• Finding parse times usually in 2-4 second range Index takes a lot of time, memory, and CPU power to generate
Content assist times out regularly
Hard to provide accurate results consistently F3 and content assist often produce no results
Need for accurate ScannerInfo often hard to satisfy
Scalability to large projects Problems only worsen when project grows
Index times longer than already long build times
ScannerInfo different for different files in project
IBM Software Group | Rational software
© 2004 IBM Corporation7
Why a DOM?
Although flexible, the callback mechanism was hard to define Unable to provide all information that every possible client would require
Often unable to determine parser context for certain constructs
Clients had added complexity to manage their own data
Parser had added complexity to provide enough data
• Parse mode proliferation
Need to address scalability Cut down on the amount of parsing we need to do
Can we reuse parse results?
Need data structures to help programmatically make changes to the source code
IBM Software Group | Rational software
© 2004 IBM Corporation8
DOM Architecture
The DOM is composed of the parser/scanner and three levels of data 1) Physical AST with mapping back through macro expansions to source
2) Logical Scope/Binding tree with cross translation unit indexing
3) AST Rewriter
Firm up interfaces for DOM creation Navigate from Core Model to DOM
Hide parser from clients (use core model)
Allows us to play with how the DOM is created to improve scalability
IBM Software Group | Rational software
© 2004 IBM Corporation9
Goals for 3.0
To reduce the fixed cost of a parse (preprocess, scan & syntax matching) and to allow for lazy semantic evaluation Improve performance & reduce memory footprint of navigation features
To provide a “complete” physical AST which can make our clients aware of preprocessor macro expansions in source code
To provide better support for C Link-time resolution cross references
Tailored implementation of parser/semantic bindings
IBM Software Group | Rational software
© 2004 IBM Corporation10
Physical AST - IASTNode
Given a file to parse, the AST Framework shall return an IASTTranslationUnit which then can be traversed or visited
The IASTNode interface hierarchy represent constructs in the C/C++ grammar.
• long x; /* IASTSimpleDeclaration with 1 IASTDeclarator */• long y(); /* IASTSimpleDeclaration with 1 IASTFunctionDeclarator */• int f() { return sizeof( int ); } /* IASTFunctionDefinition */
Physical tree is unaware of any semantic knowledge Declaration before use (C++)
Scoping
Type compatibility
lValue vs. rValue
Allows for quick generation of syntax tree Only slight overhead to cost of scan & preprocess
IBM Software Group | Rational software
© 2004 IBM Corporation11
Logical Tree - IBinding
Logical elements are higher-level constructs that map onto a physical IASTNode
For 3.0 : IASTName#resolveBinding()• long x; /* IASTName for x resolves to an IVariable */• long y(); /* IASTName for y resolves to an IFunction */• int f() { return sizeof( int ); } /* IASTName for f resolves to an IFunction*/
Beyond 3.0 – IASTBinaryExpression could bind to an user defined operator
Semantic errors return an IProblemBinding describing the error
IASTTranslationUnit can be asked for all declarations or references of a particular IBinding
Bindings can be resolved completely lazily on request or in a full traversal of the AST Indexer would require full binding resolution
Most other clients do not require all bindings to be resolved
IBM Software Group | Rational software
© 2004 IBM Corporation12
Macro Awareness in the Physical AST
Nearly all parser clients in the CDT are concerned with offsets Selection
Search Markers
Navigation
Compare
Within a preprocessor macro expansion, our IScanner implementation massages the offsets as tokens arrive
However, the CDT 2.x AST Nodes are unaware as to whether or not they are a result of a macro expansion This deficiency affects different clients differently.
IBM Software Group | Rational software
© 2004 IBM Corporation14
The Bad – Search Markers Slightly Off
IBM Software Group | Rational software
© 2004 IBM Corporation16
Introducing IASTNodeLocation
Every node in the AST derives from IASTNode
Every IASTNode provides a mechanism for resolving its location in the physical AST External Header Files
Resources
Working Copies
Macro Expansions
getNodeLocations() returns an array of IASTNodeLocation, as a node may span multiple locations (when macros are involved)
Interpretation of macro expansion locations are up to the client
IBM Software Group | Rational software
© 2004 IBM Corporation17
Better support for C
Support for link-time bindings resolving function references in C
• Will greatly aid navigation/search features for this style of C code• Refactoring/programmatic edit support will be difficult
− Definite candidate for “Preview” pane in refactoring− Without full build-model to reference, resolution is heuristic-based
Stricter emphasis upon providing custom implementation for C & GCC AST & supporting data structures will be more compact memory wise
Syntax parsing will be more accurate
Algorithms for semantic analysis in C are far less rigorous
• Should yield better performance for nearly all clients
IBM Software Group | Rational software
© 2004 IBM Corporation18
Parser Language Variants
Interfaces org.eclipse.cdt.core.dom.ast – Base interfaces that apply to both ANSI C & ISO C++
org.eclipse.cdt.core.dom.ast.c – C99 specific sub-interfaces
org.eclipse.cdt.core.dom.ast.cpp – ISO C++ 98 specific sub-interfaces
org.eclipse.cdt.core.dom.ast.gnu – GNU extensions that apply to both C & C++
org.eclipse.cdt.core.dom.ast.gnu.c – GNU extensions that apply to C
org.eclipse.cdt.core.dom.ast.gnu.cpp – GNU extensions that apply to C++
Implementations org.eclipse.cdt.internal.core.parser2 – supporting infrastructure
org.eclipse.cdt.internal.core.parser2.c – C99 & GCC support
org.eclipse.cdt.internal.core.parser2.cpp – ISO C++ 98 & G++ support
Other variants may subclass C or C++ Source Code Parser and choose which GNU extensions they wish to enable through a configuration interface
IBM Software Group | Rational software
© 2004 IBM Corporation19
Physical Tree Hierarchy
C++ extends the common AST
GNU C++ extends the C++
C++ and C are unrelated outside of the common AST package
IBM Software Group | Rational software
© 2004 IBM Corporation20
Logical Tree Hierarchy
C extends common
C++ extends common
Theoretically, new bindings could be added for future variants e.g. GCC Signatures
IBM Software Group | Rational software
© 2004 IBM Corporation21
AST Rewriting
AST Rewriter accepts requests to change AST Add (Insert/Set), Remove, Replace
The new code can be a new AST Node or a String
AST Rewrite gathers change requests and then executes Analyzes the change requests for validity
Produces text edits that can be applied to documents to affect the change
The analysis can decide not to do a change because it is too hard E.g. when macros are involved
IBM Software Group | Rational software
© 2004 IBM Corporation22
DOM as a Service
We need to take some effort to minimize parsing within Eclipse C and C++ are mature languages : hence, larger source code bases
• Multiple clients parsing on resource-change events can cripple the system
A complex web of #include’s throughout the code base is difficult to optimize per-parse without having knowledge of previous parses
• Same with templates … The Indexer is already parsing continually, we should be able to leverage
that information for all other clients that require saved-file parse trees
Since parsing can be a processor and memory-intensive operation, it is difficult for the indexer to co-ordinate its priority vs. other parser clients for system resources
• users requests an Open Declaration which competes against a running indexer for memory & CPU
Eventual Goal
• AST Service w/Index • Incremental Parse