Reverse Engineering Web Applicationswpage.unina.it/ptramont/Download/Presentazione_ICSM2005.pdf ·...
Transcript of Reverse Engineering Web Applicationswpage.unina.it/ptramont/Download/Presentazione_ICSM2005.pdf ·...
Ph.D. Dissertation Forum Ph.D. Dissertation Forum –– ICSM 2005ICSM 2005
PhPh.D. Dissertation.D. Dissertation
Reverse Engineering Reverse Engineering Web ApplicationsWeb Applications
Porfirio TramontanaPorfirio Tramontana
University of Naples “Federico II”University of Naples “Federico II”
Ph.D. Dissertation Forum Ph.D. Dissertation Forum –– ICSM 2005ICSM 2005
Web Applications: open problemsWeb Applications: open problems
In the past years, a great request for Web Applications takes place, due to the World Wide Web diffusion making available many services all over the world
Web Applications have been developed with immature design methodologies and technologies
Nowadays, there is a number of legacy Web Applications needing for maintenance and re-engineering
Ph.D. Dissertation Forum Ph.D. Dissertation Forum –– ICSM 2005ICSM 2005
Ph. D. Thesis Goals
• To propose models, methods and tools supporting Reverse Engineering and Comprehension of Web Applications
• Reverse Engineering and comprehension are fundamental tasks needed to efficiently support maintenance, testing and quality assessment of Web Applications
Doctoral Thesis Goals
Ph.D. Dissertation Forum Ph.D. Dissertation Forum –– ICSM 2005ICSM 2005
Peculiarities of script-based Web Applications
Page basedClient-Server ArchitectureInterpreted languagesClient pages may be generated “on the fly”Client pages are executed in a browser (and the designer doesn’t know what kind of browser will be used)HTML interpreters are fault tolerant
... and so on ...
Ph.D. Dissertation Forum Ph.D. Dissertation Forum –– ICSM 2005ICSM 2005
A process for the A process for the Reverse Engineering of Web ApplicationsReverse Engineering of Web Applications
Abstraction
Extraction
WASourceCode
StaticAnalysis
Dynamic Analysis
Business Level UML Diagram Abstractions
WA Execution
Identification of cloned components
Identification of Interaction Design
Patterns
Assignment of Concepts
Functional Clustering
Cloned components
Interaction Design Patterns
Concepts describing Reverse Engineering artifacts
Groups of pages realizing Web Application use cases
Structural and Business Level UML diagrams
Maintanability assessment
Abstraction
Extraction
WASourceCode
StaticAnalysis
Dynamic Analysis
Business Level UML Diagram Abstractions
WA Execution
Identification of cloned components
Identification of Interaction Design
Patterns
Assignment of Concepts
Functional Clustering
Cloned components
Interaction Design Patterns
Concepts describing Reverse Engineering artifacts
Groups of pages realizing Web Application use cases
Structural and Business Level UML diagrams
Maintanability assessment
G.A. Di Lucca, A.R. Fasolino, P. Tramontana, “Reverse Engineering Web Application: the WARE approach”, Journal of Software Maintenance and Evolution: Research and Practice, Volume 16, Issue 1-2, Date: January - April 2004, Pages: 71-101
Ph.D. Dissertation Forum Ph.D. Dissertation Forum –– ICSM 2005ICSM 2005
Analysis of Web ApplicationsAnalysis of Web Applications
1) Static analysis of the source codeA multi-language parser analysing the source code of Server pages, Client pages and Script modules has been realized.During the analysis of server pages, facts related to the client pages that are built by server pages are also recorded.Static analysis results are stored in a intermediate form and are used to fill a relational database
2) Dynamic AnalysisAnalysis of Built Client pages in order to add to the database some facts that have been observed by executing the application
The reference model adopted is an extension of the one proposed by Conallen for the forward engineering of Web Applications
Ph.D. Dissertation Forum Ph.D. Dissertation Forum –– ICSM 2005ICSM 2005
Model of Web ApplicationsModel of Web Applications
Static Page
DB Interface
Java Applet
TextareaSelect Button
Media Flash Object Mail Address
Mail Interface Server File Interface
Other Object
Generic File
Download
Parameter
Other Interface
Hyperlink
Frame
Web Object
Frameset
Anchor
Field
Server Function Server Class
Interface Object
Built Page
Form
Server Script
Session Variable
Server CookieServer Page
Submits
include
HTML Tag
Web Page
source
redirect
Client Page
Client Script
event
Modify Tag
redirect
Client Function
Client Module
Ph.D. Dissertation Forum Ph.D. Dissertation Forum –– ICSM 2005ICSM 2005
WARE (Web Application Reverse Engineering) toolWARE (Web Application Reverse Engineering) tool
Extractor Abstractor
Interface
IRF
DBR
Diagrams
Repository
HTML
ParserService
WARE-Tool
WA Source Files
WARE GUI
Graphical Visualizer
DottVCG RIGI
ASP
VBS
PHP
JS
….
IRF Translator
Query Executor
UML DiagramsAbstractor /areadocente.html
/check.asp
Redirect
/check.aspBuilds
/autenticazionedocente.html
Submit
/check.asp /check.asp/check.asp
Submit
/areadocente.html
/check.asp
Redirect
/check.aspBuilds
/autenticazionedocente.html
Submit
/check.asp /check.asp/check.asp
Submit
WARE Architecture
Detail Class Diagram abstracted by WAREG. A. Di Lucca, A.R. Fasolino, U. De Carlini, F. Pace, P. Tramontana, “WARE: a tool for the Reverse Engineering of web Applications”, Proc. of 6th
IEEE European Conference on Software Maintenance and Reengineering, CSMR 2002, IEEE CS Press, Los Alamitos, CA, Pages:241 - 250
Ph.D. Dissertation Forum Ph.D. Dissertation Forum –– ICSM 2005ICSM 2005
Functional Clustering of Web PagesFunctional Clustering of Web Pages
• Goal:
To cluster together subsets of components realizing Web Application functionalities
• Proposed Technique:
Hierarchical clusteringalgorithm, grouping Web Application pages in subsets, maximizing the cohesion and minimizing the couplingbetween them
G. A. Di Lucca, A.R. Fasolino, U. De Carlini, F. Pace, P. Tramontana, “Comprehending Web Applications by a Clustering Based Approach”, Proc. of 10th IEEE Workshop on Program Comprehension, IWPC 2002, Pages:261 - 270
Ph.D. Dissertation Forum Ph.D. Dissertation Forum –– ICSM 2005ICSM 2005
Concept AssignmentConcept AssignmentGoal:Goal:
To identify the more relevant To identify the more relevant concepts in client pages with concepts in client pages with the purpose to suggest a the purpose to suggest a semantic description of client semantic description of client pages and of functional pages and of functional clusters of pagesclusters of pages
Proposed Technique:Proposed Technique:Heuristic Algorithms based on Heuristic Algorithms based on Information RetrievalInformation Retrieval
Candidate concepts are Candidate concepts are searched in textual content of searched in textual content of client pagesclient pagesSingle common words and short Single common words and short word sequences are word sequences are candidatedcandidatedto be conceptsto be concepts
Built Client Page
Server Page
0..*
1
0..*
1<<builds>>
Data Component
StopWord
Word
has synonym
has stem
Web Page
Static Client Page
AttributeName
TagNameWeight
nested in
0..*0..*
Control Component
0..*0..*
Client PageFile name
1111
TextWeight
0..*0..*
0..1
0..1
0..1
0..1
0..*0..1 0..*0..1
Concept1
1
1
1
1
1
1
1
G.A. Di Lucca, A.R.Fasolino, P.Tramontana, U.De Carlini, “Supporting Concept Assignment in the Comprehension of Web Applications”, Proceedings of the 28th IEEE Annual International Computer Software and Applications Conference, COMPSAC 2004
Ph.D. Dissertation Forum Ph.D. Dissertation Forum –– ICSM 2005ICSM 2005
Interaction Design PatternInteraction Design Patterns s IdentificationIdentification
Goal:Goal:To identify repetitive structures in Web To identify repetitive structures in Web Client pagesClient pages
These structures can be related to known These structures can be related to known Programming PatternsProgramming Patterns
Proposed Technique:Proposed Technique:Statistical methodology based on features Statistical methodology based on features extracted in the source code of client pages.extracted in the source code of client pages.
Presence, quantity and dimension of forms, Presence, quantity and dimension of forms, tables, input fields, frames, common keywords tables, input fields, frames, common keywords and so on. and so on.
G.A. Di Lucca, A.R.Fasolino, P.Tramontana, “Recovering Interaction Design Patterns in Web Applications”, submitted to 9th IEEE European Conference on Software Maintenace and Reengineering, CSMR 2005
Ph.D. Dissertation Forum Ph.D. Dissertation Forum –– ICSM 2005ICSM 2005
Identification of cloned componentsIdentification of cloned components
Goals:Goals:ReRe--Engineering of cloned components via code Engineering of cloned components via code transformationstransformationsClassification of Built Client Pages Classification of Built Client Pages Identification of reusable Programming PatternsIdentification of reusable Programming Patterns
Proposed Techniques:Proposed Techniques:Extraction of features in the structure of Client pages Extraction of features in the structure of Client pages and in the source code of server pagesand in the source code of server pagesComputation of distance measures between pages Computation of distance measures between pages (Euclidean (Euclidean dstancedstance, Levenshtein edit distance), Levenshtein edit distance)
G.A. Di Lucca, A.R. Fasolino, P. Tramontana, U. De Carlini, “Identifying Reusable Components in Web Applications”, IASTED International Conference on Software Engineering, SE 2004, pp.526-531
Ph.D. Dissertation Forum Ph.D. Dissertation Forum –– ICSM 2005ICSM 2005
Abstraction of Business Level ModelsAbstraction of Business Level ModelsGoals:Goals:
To abstract object oriented To abstract object oriented business level models of Web business level models of Web Applications Applications
Proposed Techniques:Proposed Techniques:Classes and attributes are Classes and attributes are identified by analysing the identified by analysing the data that are exchanged data that are exchanged between user, Web pages between user, Web pages and databases. and databases. Class methods are identified Class methods are identified by analysing the functions by analysing the functions implemented by cluster of implemented by cluster of pages pages Relationships between classes Relationships between classes are identified analysing data are identified analysing data structures and data flow structures and data flow among pagesamong pages
Tutoring requestDate
TeacherNameSurnameE-mailPhone numberPasswordCode
TutoringDateStart timeEnd time
NewsNumberDateText
StudentNameSurnameE-mailPasswordCodePhone number
ExamDateTimeClassroom
CourseAcademic yearCodeName
Exam ReservationDate
G.A. Di Lucca, A.R.Fasolino, U.De Carlini, P.Tramontana, “Recovering a Business Object Model from Web Applications”, Proceedings of the 27th IEEE Annual International Computer Software and Applications Conference, COMPSAC 2003, Pages: 348 - 353
Ph.D. Dissertation Forum Ph.D. Dissertation Forum –– ICSM 2005ICSM 2005
Maintainability ModelMaintainability Model
Goals:Goals:To propose models and methods for the assessment To propose models and methods for the assessment of the maintainability of Web Applicationsof the maintainability of Web Applications
Proposed Models and Techniques:Proposed Models and Techniques:Adapting to Web Applications the Oman model Adapting to Web Applications the Oman model (thought for traditional applications)(thought for traditional applications)Selection of a set of product metrics and proposal of Selection of a set of product metrics and proposal of a maintainability index that can be calculated with a maintainability index that can be calculated with negligible effort and timenegligible effort and time
G.A. Di Lucca, A.R.Fasolino, P.Tramontana, C.A.Visaggio, “Towards the definition of a maintainability model for web applications”, Proceedings of the Eighth IEEE European Conference on Software Maintenance and Reengineering, CSMR 2004, pages:279 - 287
Ph.D. Dissertation Forum Ph.D. Dissertation Forum –– ICSM 2005ICSM 2005
Current and future worksCurrent and future works
Techniques for the dynamic analysis Techniques for the dynamic analysis of Web Applicationsof Web ApplicationsAccessibility assessment of Client Accessibility assessment of Client pagespagesMigration from Web Applications to Migration from Web Applications to Web ServicesWeb ServicesTesting of Web ApplicationsTesting of Web Applications
Mutation Testing techniquesMutation Testing techniques
Maintainability assessmentMaintainability assessmentDefinition of ageing measures for Web Definition of ageing measures for Web ApplicationsApplications
G.A. Di Lucca, M. Di Penta, A.R. Fasolino, P. Tramontana, “Supporting Web Application Evolution by DynamicAnalysis”, IWPSE 2005
G.A. Di Lucca, A.R. Fasolino, P. Tramontana, “Web Site Accessibility: Identifying and Fixing of AccessibilityProblems in Client Page Code”, WSE 2005
Ph.D. Dissertation Forum Ph.D. Dissertation Forum –– ICSM 2005ICSM 2005