Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary
-
Upload
icsm-2011 -
Category
Technology
-
view
354 -
download
2
description
Transcript of Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary
![Page 1: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/1.jpg)
EXPANDING IDENTIFIERS TO NORMALIZING SOURCE
CODE VOCABULARYPRESENTED BY DAWN LAWRIE
LOYOLA UNIVERSITY MARYLAND
IN COLLABORATION WITH DAVE BINKLEY
Friday, October 7, 11
![Page 2: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/2.jpg)
VOCABULARY MISMATCH
DIFFERENT VOCABULARY IN SOURCE CODE AND OTHER SOFTWARE ARTIFACTS
EXAMPLE
REQUIREMENT - “FEATURE LOCATION”
SOURCE CODE - “FEATURELOCATION”
OR WORSE “FLOC”
Friday, October 7, 11
![Page 3: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/3.jpg)
PURPOSE OF NORMALIZE
COPE WITH VOCABULARY MISMATCH
SOURCE CODE
OTHER SOFTWARE DOCUMENTS
Friday, October 7, 11
![Page 4: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/4.jpg)
EXAMPLE PROBLEMS
CONSIDER IDENTIFIERS
FEATURELOCATION
FLOC
Friday, October 7, 11
![Page 5: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/5.jpg)
EXAMPLE PROBLEMS
CONSIDER IDENTIFIERS
FEATURE LOCATION
FLOC
SPLITTING PROBLEM
Friday, October 7, 11
![Page 6: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/6.jpg)
EXAMPLE PROBLEMS
CONSIDER IDENTIFIERS
FEATURE LOCATION
F LOC
SPLITTING PROBLEM
SPLITTING PROBLEM
Friday, October 7, 11
![Page 7: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/7.jpg)
EXAMPLE PROBLEMS
CONSIDER IDENTIFIERS
FEATURE LOCATION
FEATURE LOCATION
SPLITTING PROBLEM
SPLITTING ANDEXPANSION PROBLEM
Friday, October 7, 11
![Page 8: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/8.jpg)
WHY NORMALIZE?
MANY SE PROBLEMS CAN BE ADDRESSED USING INFORMATION RETRIEVAL (IR) TECHNIQUES
UN-NORMALIZED CODE LEADS TO AN UNDER ESTIMATE OF THE IMPORTANCE OF CRUCIAL WORDS
Friday, October 7, 11
![Page 9: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/9.jpg)
NORMALIZE PROBLEM STATEMENT
FIND THE BEST EXPANSION OVERALL POSSIBLE SPLITS
FLOC FEATURE LOCATION
Friday, October 7, 11
![Page 10: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/10.jpg)
NORMALIZE ALGORITHM
TERMINOLOGY
HARD-WORD - WHITEHOUSE_LAWN
SOFT-WORD - WHITE-HOUSE_LAWN
Friday, October 7, 11
![Page 11: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/11.jpg)
NORMALIZE ALGORITHM
TERMINOLOGY
HARD-WORD - WHITEHOUSE_LAWN
SOFT-WORD - WHITE-HOUSE_LAWN
(2)
Friday, October 7, 11
![Page 12: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/12.jpg)
NORMALIZE ALGORITHM
TERMINOLOGY
HARD-WORD - WHITEHOUSE_LAWN
SOFT-WORD - WHITE-HOUSE_LAWN
(2)
(3)
Friday, October 7, 11
![Page 13: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/13.jpg)
NORMALIZE ALGORITHM
Friday, October 7, 11
![Page 14: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/14.jpg)
NORMALIZE ALGORITHM
STRLEN STRING LENGTH
Friday, October 7, 11
![Page 15: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/15.jpg)
MACHINE TRANSLATION APPROACH
EL PAPA VISITA LA IGLESIA
Friday, October 7, 11
![Page 16: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/16.jpg)
MACHINE TRANSLATION APPROACH
EL PAPA VISITA LA IGLESIA
THEFATHERPOTATOPOPE
VISITSVISITORHIT
THE CHURCH
Friday, October 7, 11
![Page 17: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/17.jpg)
MACHINE TRANSLATION APPROACH
EL PAPA VISITA LA IGLESIA
THEFATHERPOTATOPOPE
VISITSVISITORHIT
THE CHURCH
Friday, October 7, 11
![Page 18: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/18.jpg)
MACHINE TRANSLATION APPROACH
EL PAPA VISITA LA IGLESIA
THEFATHERPOTATOPOPE
VISITSVISITORHIT
THE CHURCH
STRONG COHESION
Friday, October 7, 11
![Page 19: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/19.jpg)
MACHINE TRANSLATION APPROACH
EL PAPA VISITA LA IGLESIA
THEFATHERPOTATOPOPE
VISITSVISITORHIT
THE CHURCH
STRONG COHESION
Friday, October 7, 11
![Page 20: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/20.jpg)
NORMALIZE ALGORITHM
Friday, October 7, 11
![Page 21: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/21.jpg)
NORMALIZE ALGORITHM
STRLEN
Friday, October 7, 11
![Page 22: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/22.jpg)
NORMALIZE ALGORITHM
STRLENS-TRLEN
ST-RLEN
STR-LENSTRL_ENSTRLE_NS_T_RLENS-TR-LENS_TRL_ENS_TRLE_NST_R_LENST_RL_ENST_RLE_NSTR_L_ENSTR_LE_NSTRL_E_NS_T_R_LENS_T_RL_ENS_T_RLE_NS_TR_L_ENS_TR_LE_NS_TRL_E_NST_R_L_ENST_R_LE_NST_RL_E_NSTR_L_E_NS_T_R_L_ENS_T_R_LE_NS_TR_L_E_NST_R_L_E_NS-T-R-L-E-N
Friday, October 7, 11
![Page 23: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/23.jpg)
NORMALIZE ALGORITHM
STRLENS-TRLEN
ST-RLEN
STR-LENSTRL_ENSTRLE_NS_T_RLENS-TR-LENS_TRL_ENS_TRLE_NST_R_LENST_RL_ENST_RLE_NSTR_L_ENSTR_LE_NSTRL_E_NS_T_R_LENS_T_RL_ENS_T_RLE_NS_TR_L_ENS_TR_LE_NS_TRL_E_NST_R_L_ENST_R_LE_NST_RL_E_NSTR_L_E_NS_T_R_L_ENS_T_R_LE_NS_TR_L_E_NST_R_L_E_NS-T-R-L-E-N
E(RLEN) = {RIFLEMEN}
Friday, October 7, 11
![Page 24: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/24.jpg)
NORMALIZE ALGORITHM
STRLENS-TRLEN
ST-RLEN
STR-LENSTRL_ENSTRLE_NS_T_RLENS-TR-LENS_TRL_ENS_TRLE_NST_R_LENST_RL_ENST_RLE_NSTR_L_ENSTR_LE_NSTRL_E_NS_T_R_LENS_T_RL_ENS_T_RLE_NS_TR_L_ENS_TR_LE_NS_TRL_E_NST_R_L_ENST_R_LE_NST_RL_E_NSTR_L_E_NS_T_R_L_ENS_T_R_LE_NS_TR_L_E_NST_R_L_E_NS-T-R-L-E-N
E(RLEN) = {RIFLEMEN}
WILDCARD EXPANSION
R*L*E*N*
Friday, October 7, 11
![Page 25: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/25.jpg)
NORMALIZE ALGORITHM
STRLENS-TRLEN
ST-RLEN
STR-LENSTRL_ENSTRLE_NS_T_RLENS-TR-LENS_TRL_ENS_TRLE_NST_R_LENST_RL_ENST_RLE_NSTR_L_ENSTR_LE_NSTRL_E_NS_T_R_LENS_T_RL_ENS_T_RLE_NS_TR_L_ENS_TR_LE_NS_TRL_E_NST_R_L_ENST_R_LE_NST_RL_E_NSTR_L_E_NS_T_R_L_ENS_T_R_LE_NS_TR_L_E_NST_R_L_E_NS-T-R-L-E-N
E(ST) = {SET, STOP, STRING}E(RLEN) = {RIFLEMEN}
E(STR) = {STEER, STRING}E(LEN) = {LENDER, LENGTH}
Friday, October 7, 11
![Page 26: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/26.jpg)
NORMALIZE ALGORITHM PART I
STRING STEER
VSSTR
Friday, October 7, 11
![Page 27: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/27.jpg)
NORMALIZE ALGORITHM PART I
STRING STEERLENDERLENGTH
LENDERLENGTH
VSSTR
Friday, October 7, 11
![Page 28: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/28.jpg)
NORMALIZE ALGORITHM PART I
STRING STEERLENDERLENGTH
LENDERLENGTH
VS
1. FIND COHESION BY SUMMING LOG OF PROBABILITIES OF WORD PAIRS
STR
Friday, October 7, 11
![Page 29: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/29.jpg)
NORMALIZE ALGORITHM PART I
STRING STEERLENDERLENGTH
LENDERLENGTH
VS
+ +
1. FIND COHESION BY SUMMING LOG OF PROBABILITIES OF WORD PAIRS
COHESIONBCOHESIONA
STR
Friday, October 7, 11
![Page 30: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/30.jpg)
NORMALIZE ALGORITHM PART I
STRING STEERLENDERLENGTH
LENDERLENGTH
VS
+ +
1. FIND COHESION BY SUMMING LOG OF PROBABILITIES OF WORD PAIRS
2. SELECT EXPANSION THAT MAXIMIZES COHESION
COHESIONBCOHESIONA
STR
Friday, October 7, 11
![Page 31: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/31.jpg)
NORMALIZE ALGORITHM PART I
STRING STEERLENDERLENGTH
LENDERLENGTH
VS
+ +
1. FIND COHESION BY SUMMING LOG OF PROBABILITIES OF WORD PAIRS
2. SELECT EXPANSION THAT MAXIMIZES COHESION
COHESIONBCOHESIONA
STR
Friday, October 7, 11
![Page 32: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/32.jpg)
NORMALIZE ALGORITHM PART I
STRING STEERLENDERLENGTH
LENDERLENGTH
VS
+ +
1. FIND COHESION BY SUMMING LOG OF PROBABILITIES OF WORD PAIRS
2. SELECT EXPANSION THAT MAXIMIZES COHESION
COHESIONBCOHESIONA
STRING
STR
Friday, October 7, 11
![Page 33: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/33.jpg)
NORMALIZE ALGORITHM PART II
VS
STR-LEN ST-RLEN
Friday, October 7, 11
![Page 34: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/34.jpg)
NORMALIZE ALGORITHM PART II
VS
STR-LEN ST-RLENSTRING LENGTH STOP RIFLEMEN
Friday, October 7, 11
![Page 35: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/35.jpg)
NORMALIZE ALGORITHM PART II
VS
STR-LEN ST-RLENSTRING LENGTH STOP RIFLEMEN
1. FIND COHESION OVER EXPANSIONS
Friday, October 7, 11
![Page 36: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/36.jpg)
NORMALIZE ALGORITHM PART II
VS
STR-LEN ST-RLENSTRING LENGTH STOP RIFLEMEN
1. FIND COHESION OVER EXPANSIONS
2. SELECT EXPANSION OF THE SPLIT THAT MAXIMIZES COHESION
Friday, October 7, 11
![Page 37: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/37.jpg)
NORMALIZE ALGORITHM PART II
VS
STR-LEN ST-RLENSTRING LENGTH STOP RIFLEMEN
1. FIND COHESION OVER EXPANSIONS
2. SELECT EXPANSION OF THE SPLIT THAT MAXIMIZES COHESION
Friday, October 7, 11
![Page 38: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/38.jpg)
NORMALIZE ALGORITHM PART II
VS
STR-LEN ST-RLENSTRING LENGTH STOP RIFLEMEN
1. FIND COHESION OVER EXPANSIONS
2. SELECT EXPANSION OF THE SPLIT THAT MAXIMIZES COHESION
STRING LENGTH
Friday, October 7, 11
![Page 39: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/39.jpg)
ADDING CONTEXT
Friday, October 7, 11
![Page 40: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/40.jpg)
ADDING CONTEXT
DIR
Friday, October 7, 11
![Page 41: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/41.jpg)
ADDING CONTEXT
DIR E(DIR) = {DIRECTION, DIRECTORY}
Friday, October 7, 11
![Page 42: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/42.jpg)
ADDING CONTEXT
DIR E(DIR) = {DIRECTION, DIRECTORY}
CONTEXT = {FORWARD, BACKWARD}
Friday, October 7, 11
![Page 43: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/43.jpg)
ADDING CONTEXT
FIND COHESION WITH CONTEXT WORDS IN ADDITION TO EXPANSIONS OF OTHER SOFT WORDS
USED IN BOTH PART 1 AND PART 2
DIR E(DIR) = {DIRECTION, DIRECTORY}
CONTEXT = {FORWARD, BACKWARD}
Friday, October 7, 11
![Page 44: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/44.jpg)
NORMALIZE IMPLEMENTATION
USES GenTest TO SPLIT IDENTIFIERS
RETURNS MULTIPLE SPLITS
GOOGLE 5-GRAM DATASET
Friday, October 7, 11
![Page 45: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/45.jpg)
EVALUATION
Program Loc SLoc Unique Ids
which-2.20 3,670 2,293 487
a2ps-4.14 62,347 38,436 4,393
Program Selected Ids Hard Words Soft Words
which-2.20 487 903 1214
a2ps-4.14 211 459 618
Friday, October 7, 11
![Page 46: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/46.jpg)
EVALUATION
THREE GROUPS OF IDENTIFIERS
STANDARD LIBRARY CALLS
NAMES FROM STANDARD HEADER FILES / KEYWORDS
DOMAIN NAMES
Friday, October 7, 11
![Page 47: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/47.jpg)
EVALUATION
THREE GROUPS OF IDENTIFIERS
STANDARD LIBRARY CALLS
NAMES FROM STANDARD HEADER FILES / KEYWORDS
DOMAIN NAMES
THREE GROUPS OF IDENTIFIERS
DOMAIN NAMES
Friday, October 7, 11
![Page 48: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/48.jpg)
EVALUATION
THREE GROUPS OF IDENTIFIERS
STANDARD LIBRARY CALLS
NAMES FROM STANDARD HEADER FILES / KEYWORDS
DOMAIN NAMES
THREE GROUPS OF IDENTIFIERS
DOMAIN NAMES
Program Filtered Ids Reported Ids
which-2.20 152 335
a2ps-4.14 46 166
Friday, October 7, 11
![Page 49: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/49.jpg)
EXAMPLE EXPANSIONS
id Top 10 Expansion
Top Expansion
nextchar next_character next_character
indfound index_found_need index_found
optarg option_are_g optarg
itemno i_them_not itemno
Friday, October 7, 11
![Page 50: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/50.jpg)
RESEARCH QUESTIONS
WHAT IS THE OVERALL ACCURACY OF NORMALIZE?
DOES THE VOCABULARY USED HAVE A SIGNIFICANT IMPACT ON THE EXPANSION’S ACCURACY?
CAN THE EXPANDER INFORM THE SPLITTER?
CAN THE SPLITTER INFORM THE EXPANDER?
Friday, October 7, 11
![Page 51: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/51.jpg)
ACCURACY ON DOMAIN IDS
Friday, October 7, 11
![Page 52: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/52.jpg)
SOURCE OF EXPANSION WORDS
SOURCE CODE
INTERNAL DOCUMENTATION
MANUAL
Friday, October 7, 11
![Page 53: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/53.jpg)
BEST VOCABULARY SOURCE?
Friday, October 7, 11
![Page 54: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/54.jpg)
FUTURE WORK
EXPLORING DIFFERENT SOURCES OF CO-OCCURRENCE DATA
EXPLORING DIFFERENT WAYS OF CALCULATING PROBABILITIES
EXAMINING NORMALIZATION IN CONTEXT OF AN INFORMATION RETRIEVAL TASK
Friday, October 7, 11
![Page 55: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/55.jpg)
SUMMARY
IDENTIFIERS ARE WRITTEN DIFFERENTLY THAN OTHER SOFTWARE DOCUMENTS
DEGRADES PERFORMANCE OF IR TECHNIQUES
NORMALIZE CURRENTLY EXPANDS ABOUT HALF OF SOFT WORDS CORRECTLY
Friday, October 7, 11
![Page 56: Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b4a89c4a7959a0168b460e/html5/thumbnails/56.jpg)
QUESTIONS?
Need an identifier split?GenTest Splitter available at
splitit.cs.loyola.edu
Friday, October 7, 11