Introducing Natural Language Program Analysis
description
Transcript of Introducing Natural Language Program Analysis
![Page 1: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/1.jpg)
Introducing Natural Language Program Analysis
Lori Pollock, K. Vijay-Shanker, David Shepherd,
Emily Hill, Zachary P. Fry, Kishen Maloor
![Page 2: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/2.jpg)
NLPA Research Team Leaders
Lori Pollock“Team Captain”
K. Vijay-Shanker“The Umpire”
![Page 3: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/3.jpg)
ProblemModern software is large and complex
object oriented class hierarchy
Software development tools are needed
![Page 4: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/4.jpg)
Successes in Software Development Tools
object oriented class hierarchy
Good with local tasks
Good with traditional structure
![Page 5: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/5.jpg)
object oriented class hierarchy
Scattered tasks are difficult
Programmers use more than traditional program structure
Issues in Software Development Tools
![Page 6: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/6.jpg)
public interface Storable{...
activate tool
save drawing
update drawing
undo action
public void Circle.save()
//Store the fields in a file....
object oriented system
Key Insight: Programmers leave natural language clues that
can benefit software development tools
Observations in Software Development Tools
![Page 7: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/7.jpg)
Studies on choosing identifiers
Impact of human cognition on names [Liblit et al. PPIG 06] Metaphors, morphology, scope, part of speech hints Hints for understanding code
Analysis of Function identifiers [Caprile and Tonella WCRE 99] Lexical, syntactic, semantic Use for software tools: metrics, traceability, program understanding
Carla, the compiler writer Pete, the programmer
I don’t care about names.
So, I could use x, y, z. But, no one
will understandmy code.
![Page 8: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/8.jpg)
Our Research Path
[MACS 05, LATE 05]
[AOSD 06]
[ASE 05, AOSD 07, PASTE 07]
Motivated usefulness of exploiting natural language (NL) clues in toolsDeveloped extraction process and an NL-
based program representationCreated and evaluated a concern
location tool and an aspect miner with NL-based analysis
![Page 9: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/9.jpg)
pic
Name: David C ShepherdNickname: Leadoff HitterCurrent Position: PhD May 30, 2007Future Position: Postdoc, Gail Murphy
StatsYear coffees/day redmarks/paper draft2002 0.1 5002007 2.2 100
![Page 10: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/10.jpg)
Aspect Mining
Aspect-Oriented Programming
Aspect Mining TaskLocate refactoring
candidates
Applying NL Clues for
Molly, the Maintainer
How can I fix Paul’s
atrocious code?
![Page 11: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/11.jpg)
Timna: An Aspect Mining Framework [ASE 05]
Uses program analysis clues for mining Combines clues using machine learning Evaluated vs. Fan-in Precision (quality) and Recall (completeness)
P R 37 2 62 60
Fan-InTimna
![Page 12: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/12.jpg)
iTimna (Timna with NL) Integrates natural language cluesExample: Opposite verbs (open and close)
P R 37 2 62 60 81 73
Fan-InTimna iTimna
Integrating NL Clues into Timna
Natural language information increases the effectiveness of Timna[Come back Thurs 10:05am]
![Page 13: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/13.jpg)
Concern Location
60-90% software costs spent on reading and navigating code for maintenance*
(fixing bugs, adding features, etc.)
*[Erlikh] Leveraging Legacy System Dollars for E-Business
Applying NL Clues for
Motivation
![Page 14: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/14.jpg)
Key Challenge: Concern Location
Find, collect, and understand all source code related to a particular concept
Concerns are often crosscutting
![Page 15: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/15.jpg)
State of the Art for Concern Location
Mining Dynamic Information [Wilde ICSM 00]
Program Structure Navigation [Robillard FSE 05, FEAT, Schaefer ICSM 05]
Search-Based Approaches RegEx [grep, Aspect Mining Tool 00]
LSA-Based [Marcus 04]
Word-Frequency Based [GES 06]
Reduced to similar problem
Slow
Fast
Fragile
Sensitive
No Semantics
![Page 16: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/16.jpg)
Limitations of Search Techniques
1. Return large result sets
2. Return irrelevant results
3. Return hard-to-interpret result sets
![Page 17: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/17.jpg)
The Find-Concept Approach
concept
Find-ConceptConcrete query
Recommendations
Source Code
Method a
Method bMethod c
Method d Method e
NL-basedCode Rep
Result GraphNatural
Language Information
1. More effective search
2. Improved search terms
3. Understandable results
![Page 18: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/18.jpg)
Underlying Program Analysis
Action-Oriented Identifier Graph (AOIG) [AOSD 06] Provides access to NL information Provides interface between NL and traditional
Word Recommendation Algorithm NL-based
Stemmed/Rooted: complete, completing Synonym: finish, complete
Combining NL and Traditional Co-location: completeWord()
![Page 19: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/19.jpg)
Experimental Evaluation
Research Questions Which search tool is most effective at forming and
executing a query for concern location? Which search tool requires the least human effort to form
an effective query?
Methodology: 18 developers complete nine concern location tasks on medium-sized (>20KLOC) programs
Measures:Precision (quality), Recall (completeness), F-Measure (combination of both P & R)
Find Concept, GES, ELex
![Page 20: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/20.jpg)
Overall Results
Effectiveness FC > Elex with statistical
significance FC >= GES on 7/9 tasks FC is more consistent than GES
Effort FC = Elex = GES
FC is more consistent and more effective in experimental study without requiring more effort
Across all tasks
![Page 21: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/21.jpg)
Natural Language Extraction from Source Code
Key Challenges:Decode name usageDevelop automatic extraction
processCreate NL-based program
representation
Molly, the Maintainer
What was Pete thinking
when he wrote this code?
![Page 22: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/22.jpg)
Natural Language: Which Clues to Use?
Software MaintenanceTypically focused on actionsObjects are well-modularized
Maintenance Requests
![Page 23: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/23.jpg)
Natural Language: Which Clues to Use?
Software MaintenanceTypically focused on actionsObjects are well-modularized
Focus on actions Correspond to verbsVerbs need Direct Object
(DO)
Extract verb-DO pairs
![Page 24: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/24.jpg)
Extracting Verb-DO Pairs
Two types of extractionclass Player{ /** * Play a specified file with specified time interval */ public static boolean play(final File file,final float fPosition,final long length) { fCurrent = file; try { playerImpl = null; //make sure to stop non-fading players stop(false); //Choose the player Class cPlayer = file.getTrack().getType().getPlayerImpl(); …}
Extraction from comments
Extraction from method signatures
![Page 25: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/25.jpg)
public UserList getUserListFromFile( String path ) throws IOException {
try {
File tmpFile = new File( path );
return parseFile(tmpFile);
} catch( java.io.IOException e ) {
throw new IOrException( ”UserList format issue" + path + " file " + e );
}
}
Extracting Clues from Signatures
1. POS Tag Method Name
2. Chunk Method Name
3. Identify Verb and Direct-Object (DO)
get<verb> User<adj> List<noun> From <prep> File <noun>
get<verb phrase> User List<noun phrase> From File <prep phrase>
POS Tag
Chunk
![Page 26: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/26.jpg)
pic
Name: Zak FryNickname: The RookieCurrent Position: Upcoming seniorFuture Position: Graduate School
StatsYear diet cokes/day lab days/week2006 1 22007 6 8
![Page 27: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/27.jpg)
Developing rules for extraction
For many methods: Identify relevant verb (V)
and direct object (DO) in method signature
Classify pattern of V and DO locations
If new pattern, create new extraction rule
verbDO
verb DO
verbDO
![Page 28: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/28.jpg)
Our Current Extraction Rules
4 general rules with subcategories:
URL parseURL()
void mouseDragged()
void Host.onSaved()
Left Verb
Right Verb
Generic Verb
Unidentified Verb
void message() message-
hostsaved
mousedragged
URLparse
DOVerb
![Page 29: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/29.jpg)
Example: Sub-Categories for Left-Verb General Rule
Look beyond the method name:
Parameters, Return type, Declaring class name, Type hierarchy
Subcategory1) Standard left verb 2) No DO in method name; has parameters; non object return type3) No DO in method name; no parameters; no return type4) Creational left verb; has return type5) No DO in method name; has parameters; return type is more specific than parameters in type hierarchy6) No DO in method name; parameters are more specific than parameters in type hierarchy
2) No DO in method name; has parameters; non object return type
Verb-DO pair:
<remove, UserID>Left
Verb
![Page 30: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/30.jpg)
Representing Verb-DO Pairs
Action-Oriented Identifier Graph (AOIG)
verb1 verb2 verb3 DO1 DO2 DO3
verb1, DO1 verb1, DO2 verb3, DO2 verb2, DO3
source code files
use
use
use
use
use
use
useuse
![Page 31: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/31.jpg)
Action-Oriented Identifier Graph (AOIG)
play add remove file playlist listener
play, file play, playlist remove, playlist add, listener
source code files
use
use
use
use
use
use
useuse
Representing Verb-DO Pairs
![Page 32: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/32.jpg)
Evaluation of Extraction Process
Compare automatic vs ideal (human) extraction 300 methods from 6 medium open source programs Annotated by 3 Java developers
Promising Results Precision: 57% Recall: 64%
Context of Results Did not analyze trivial methods On average, at least verb OR direct object obtained
![Page 33: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/33.jpg)
pic
Name: Emily Gibson HillNickname: Batter on DeckCurrent Position: 2nd year PhD StudentFuture Position: PhD Candidate
StatsYear cokes/day meetings/week2003 0.2 12007 2 5
![Page 34: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/34.jpg)
Program Exploration
Purpose: Expedite software maintenance and program comprehension
Key Insight: Automated tools can use program structure and identifier names to save the developer time and effort
Ongoing work:
![Page 35: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/35.jpg)
Dora the Program Explorer*
* Dora comes from exploradora, the Spanish word for a female explorer.
DoraDora
Natural Language Query• Maintenance request• Expert knowledge• Query expansion
Natural Language Query• Maintenance request• Expert knowledge• Query expansion
Relevant Neighborhood
Program Structure• Representation
• Current: call graph• Seed starting point
Relevant Neighborhood• Subgraph relevant to query
Query
![Page 36: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/36.jpg)
State of the Art in Exploration
Structural (dependence, inheritance) Slicing Suade [Robillard 2005]
Lexical (identifier names, comments) Regular expressions: grep, Eclipse search Information Retrieval: FindConcept,
Google Eclipse Search [Poshyvanyk 2006]
![Page 37: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/37.jpg)
Motivating need for structural and lexical information
Program: JBidWatcher, an eBay auction sniping program
Bug: User-triggered add auction event has no effect
Task: Locate code related to ‘add auction’ trigger
Seed: DoAction() method, from prior knowledge
ExampleScenario
![Page 38: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/38.jpg)
DoNada() DoNada() DoNada() DoNada() DoNada()DoNada() DoNada()DoNada()DoNada() DoNada() DoNada()
DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()
DoNada() DoNada()DoNada() DoNada() DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()
Using only structural information
DoAction() has 38 callees, only 2/38 are relevant Relevant
Methods
Irrelevant Methods
Looking for: ‘add auction’ trigger
DoAction()
DoAdd()
DoPasteFromClipboard()
And what if you wanted to explore more than one edge away?
Locates locally relevant items, but many irrelevant
![Page 39: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/39.jpg)
Using only lexical information
50/1812 methods contain matches to ‘add*auction’ regular expression query
Only 2/50 are relevant
Locates globally relevant items, but many irrelevant
Looking for: ‘add auction’ trigger
![Page 40: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/40.jpg)
DoNada() DoNada() DoNada() DoNada() DoNada()DoNada() DoNada()DoNada()DoNada() DoNada() DoNada()
DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()
DoNada() DoNada()DoNada() DoNada() DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()DoNada()
Combining Structural & Lexical Information Structural: guides exploration
from seed
Looking for: ‘add auction’ trigger
RelevantNeighborhood
DoAction()
DoPasteFromClipboard()
DoAdd()
Lexical: prunes irrelevant edges
![Page 41: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/41.jpg)
The Dora Approach
Determine method relevance to queryCalculate lexical-based relevance score
Low-scored methods pruned from neighborhood
Recursively explore
Prune irrelevant structural edges from seed
![Page 42: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/42.jpg)
Calculating Relevance Score:Term Frequency Score based on query term frequency of the method
6 query term 6 query term occurrencesoccurrences6 query term 6 query term occurrencesoccurrences
Only 2 Only 2 occurrencesoccurrences
Only 2 Only 2 occurrencesoccurrences
Query: ‘add auction’
![Page 43: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/43.jpg)
Weigh term frequency based on location: Method name more important than body Method body statements normalized by length
Calculating Relevance Score:Location Weights Query: ‘add auction’
?
![Page 44: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/44.jpg)
Dora explores ‘add auction’ trigger
From DoAction() seed:Correctly identified at 0.5 threshold
DoAdd() (0.93)DoPasteFromClipboard() (0.60)
With only one false positiveDoSave() (0.52)
![Page 45: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/45.jpg)
Summary
NL technology usedSynonyms, collocations, morphology, word frequencies, part-of-speech tagging, AOIG
Evaluation indicatesNatural language information shows promise for improving software development tools
Key to successAccurate extraction of NL clues
![Page 46: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/46.jpg)
Our Current and Future Work
Basic NL-based tools for softwareAbbreviation expanderProgram synonymsDetermining relative importance of words
Integrating information retrieval techniques
![Page 47: Introducing Natural Language Program Analysis](https://reader036.fdocuments.net/reader036/viewer/2022062322/568144db550346895db1a6d1/html5/thumbnails/47.jpg)
Posed Questions for Discussion
What open problems faced by software tool developers can be mitigated by NLPA?
Under what circumstances is NLPA not useful?