Scalable Semantic Web-based Source Code Search Infrastructure
-
Upload
icsm-2010 -
Category
Technology
-
view
3.810 -
download
2
description
Transcript of Scalable Semantic Web-based Source Code Search Infrastructure
![Page 1: Scalable Semantic Web-based Source Code Search Infrastructure](https://reader036.fdocuments.net/reader036/viewer/2022070302/546efc9eb4af9ff00b8b4567/html5/thumbnails/1.jpg)
I. Keivanloo, L. Roostapour, P. Schugerl, J. Rilling
Scalable Semantic Web-based Source Code Search Infrastructure
SE-CodeSearch
![Page 2: Scalable Semantic Web-based Source Code Search Infrastructure](https://reader036.fdocuments.net/reader036/viewer/2022070302/546efc9eb4af9ff00b8b4567/html5/thumbnails/2.jpg)
ICSM 2010 ERA 2
Search
Who lives in London?
Who has relatives in London!
9/14/2010
![Page 3: Scalable Semantic Web-based Source Code Search Infrastructure](https://reader036.fdocuments.net/reader036/viewer/2022070302/546efc9eb4af9ff00b8b4567/html5/thumbnails/3.jpg)
ICSM 2010 ERA 3
Source code search
Where is it defined? Where is it called!
9/14/2010
![Page 4: Scalable Semantic Web-based Source Code Search Infrastructure](https://reader036.fdocuments.net/reader036/viewer/2022070302/546efc9eb4af9ff00b8b4567/html5/thumbnails/4.jpg)
ICSM 2010 ERA 4
Query types • Pure structural (PSQ)
• Metadata (MDQ)
• Transitive closure-based (TCQ)
• Method call (MCQ)
• Absent information (AIQ)
• Mixed queries (MXQ)
Requirement-based classification
9/14/2010
![Page 5: Scalable Semantic Web-based Source Code Search Infrastructure](https://reader036.fdocuments.net/reader036/viewer/2022070302/546efc9eb4af9ff00b8b4567/html5/thumbnails/5.jpg)
ICSM 2010 ERA 5
SICS Semantic-rich Internet-scale Code Search
•Supports all query types •Handles a tera-scale repository
![Page 6: Scalable Semantic Web-based Source Code Search Infrastructure](https://reader036.fdocuments.net/reader036/viewer/2022070302/546efc9eb4af9ff00b8b4567/html5/thumbnails/6.jpg)
ICSM 2010 ERA 6
Is there any SICS?
•NO
![Page 7: Scalable Semantic Web-based Source Code Search Infrastructure](https://reader036.fdocuments.net/reader036/viewer/2022070302/546efc9eb4af9ff00b8b4567/html5/thumbnails/7.jpg)
ICSM 2010 ERA 7
•Incomplete code (no binaries)
•Repository evolution–The crawler is working 24/7–Dependent code might be indexed in any order
•Very large repository (tera-scale)
Challenges
9/14/2010
![Page 8: Scalable Semantic Web-based Source Code Search Infrastructure](https://reader036.fdocuments.net/reader036/viewer/2022070302/546efc9eb4af9ff00b8b4567/html5/thumbnails/8.jpg)
•Creates small ontology for each code part
• Code facts
• Static code analysis rules
•Saves them in the RDF repository
•Uses backward chaining reasoner to answer
• Not only structural query
• But also all the other query types
(embedded code analysis at runtime)
SE-CodeSearch
9/14/2010
![Page 9: Scalable Semantic Web-based Source Code Search Infrastructure](https://reader036.fdocuments.net/reader036/viewer/2022070302/546efc9eb4af9ff00b8b4567/html5/thumbnails/9.jpg)
ICSM 2010 ERA 9
SICSONT
• Source Code Ontology for Internet-scale Static Analysis
http://aseg.cs.concordia.ca/ontology#sicsont
9/14/2010
![Page 10: Scalable Semantic Web-based Source Code Search Infrastructure](https://reader036.fdocuments.net/reader036/viewer/2022070302/546efc9eb4af9ff00b8b4567/html5/thumbnails/10.jpg)
ICSM 2010 ERA 10
Semantic Web-based Static Code Analysis
• Knowledge-based approach
• Inference engine does the analysis
• Restricted to OWL-DL
– De facto standard for knowledge sharing
– Based on Description Logic
• Decidable
• More restricted than rule-based families
9/14/2010
![Page 11: Scalable Semantic Web-based Source Code Search Infrastructure](https://reader036.fdocuments.net/reader036/viewer/2022070302/546efc9eb4af9ff00b8b4567/html5/thumbnails/11.jpg)
ICSM 2010 ERA 11
Semantic Web-based Static Code Analysis (Cont.)
• No compiler• Possible analysis– Inheritance tree computation– Fully qualified name resolution– Method call/return statement and type resolution
• Translation template for each analysis rule
9/14/2010
![Page 12: Scalable Semantic Web-based Source Code Search Infrastructure](https://reader036.fdocuments.net/reader036/viewer/2022070302/546efc9eb4af9ff00b8b4567/html5/thumbnails/12.jpg)
Queries:1. Transitivity closure-based2. Method call
Dataset:600,000 Java classes (no binaries) from a very large dataset (~400 GB)
http://www.ics.uci.edu/~lopes/datasets.
Scalability Test
Hardware:• 3 GB RAM• 3.40 GHz CPU
9/14/2010
![Page 13: Scalable Semantic Web-based Source Code Search Infrastructure](https://reader036.fdocuments.net/reader036/viewer/2022070302/546efc9eb4af9ff00b8b4567/html5/thumbnails/13.jpg)
ICSM 2010 ERA 13
SE-CodeSearch Highlights
•Avoid expensive knowledge
modeling
•Optimized ontology population
•Backward-chaining reasoner
•Disk-based computation
–Works on minimum hardware
9/14/2010
![Page 14: Scalable Semantic Web-based Source Code Search Infrastructure](https://reader036.fdocuments.net/reader036/viewer/2022070302/546efc9eb4af9ff00b8b4567/html5/thumbnails/14.jpg)
ICSM 2010 ERA 14
SE-CodeSearch Highlights (Cont.)
•Parallelization
–One pass code analysis
–Static code analysis on
•Complete code
•Partial Code
–Independent of parsing order
•First Package A then Package B
•First Package B then Package A
–Repository evolves incrementally
•Open World Reasoning (Not available in Relational DB)9/14/2010
![Page 15: Scalable Semantic Web-based Source Code Search Infrastructure](https://reader036.fdocuments.net/reader036/viewer/2022070302/546efc9eb4af9ff00b8b4567/html5/thumbnails/15.jpg)
ICSM 2010 ERA 15
The poster
9/14/2010
![Page 16: Scalable Semantic Web-based Source Code Search Infrastructure](https://reader036.fdocuments.net/reader036/viewer/2022070302/546efc9eb4af9ff00b8b4567/html5/thumbnails/16.jpg)
ICSM 2010 ERA 16
?• SE-CodeSearch homepage:
http://aseg.cs.concordia.ca/codesearch
• Source Code Ontology homepage:http://aseg.cs.concordia.ca/ontology
• ASEG Lab. homepage:http://aseg.cs.concordia.ca
• Any question:[email protected]