A Lightweight Approach to Uncover Technical Information in Unstructured Data
-
Upload
nicolas-bettenburg -
Category
Technology
-
view
343 -
download
0
description
Transcript of A Lightweight Approach to Uncover Technical Information in Unstructured Data
A Lightweight Approach to Uncover Technical
Information in Unstructured Data
Nicolas Bettenburg, Bram Adams, Ahmed E. HassanQueen’s University, Kingston, ON
Michel SmidtUniversity of Bremen, Germany
1
2
The code after "if callback.isAcceleratorInUse (SWT.ALT | character ) )" inside Eclipse's MenuManager.java removes the mnemonic, but it seems like Eclipse should be checking isAcceleratorInUse.
2
“Our developers have a severe problem.”
3
“Our developers have a severe problem.”
[ Our_PRP$ developers_NNS ]) <: have_VBP :> ([ a_DT severe_JJ problem_NN ])._.
NLP
3
“Our developers have a severe problem.”
[ Our_PRP$ developers_NNS ]) <: have_VBP :> ([ a_DT severe_JJ problem_NN ])._.
pronoun, possessive
NLP
3
“Our developers have a severe problem.”
[ Our_PRP$ developers_NNS ]) <: have_VBP :> ([ a_DT severe_JJ problem_NN ])._.
pronoun, possessive
noun, common, plural
NLP
3
“Our developers have a severe problem.”
[ Our_PRP$ developers_NNS ]) <: have_VBP :> ([ a_DT severe_JJ problem_NN ])._.
pronoun, possessive
noun, common, plural
verb, present tense
determiner
adjective, ordinal
noun, common, singular
NLP
3
Structured Text
4
NLP?can’t deal withthe source code parts
PARSERS?can’t deal withthe natural language parts
5
Past Solutions
infoZillaBettenburg et al. - MSR’08
•Based on heuristics• Stack Traces•Code snippets at block level•Patches
6
Past Solutions
MilerBacchelli et al. - ICSE’10, ICPC’10
•Based on heuristics•Classifies lines as source code•Classifies documents as containing code• Finds class names
7
Past Solutions
MilerBacchelli et al. - ICPC’10
infoZillaBettenburg et al. - MSR’08
•only specific kinds of technical information!• new heuristics to extend (complex, error prone)• for some kind of technical information, infeasible
8
Build ID: M20070212-1330 Steps To Reproduce: 1. Create a plugin for eclipse that includes a key binding for "M1+S" (ie. Alt+S) where S is any letter that is used as a mnemonic in one of the top level menus. Since eclipse uses "S" as the mnemonic for Help > &Software Updates, "S" is sufficient. 2. Launch the plugin as part of Eclipse IDE 3. Press Alt+H to bring down the Help menu (to go along with our example in #1) BUG: Notice "Software Updates" is missing its mnemonic. More information: The code after "if (callback.isAcceleratorInUse(SWT.ALT | character))" inside Eclipse's MenuManager.java removes the mnemonic, but it seems like Eclipse should be checking "isAcceleratorInUse" only for top level menumanagers like File,Edit,...,Help, etc. : /* (non-Javadoc) * @see org.eclipse.jface.action.IContributionItem#update(java.lang.String) */ public void update(String property) { IContributionItem items[] = getItems(); for (int i = 0; i < items.length; i++) { items[i].update(property); } [...] } Any status on this bug? I'd consider any contributions for M6 (API) or M7 (non-API) [...] A 3.5 fix would be to make that behaviour optional in MenuManager with API and off by default early in 3.5, and to have the WorkbenchActionBuilder contributed MenuManagers and actionSets/editorActions contributed MenuManagers turn it on (if I can find MenuManagers in the correct place). I'd like us to work with the SWT team to make sure we understand what the correct platform behavior is, and make sure that we aren't getting in the way of that. The current behavior (i.e. turning off mnemonics) seems odd to me, in general. If we're going to fix this, we should fix it properly.
9
Build ID: M20070212-1330 Steps To Reproduce: 1. Create a plugin for eclipse that includes a key binding for "M1+S" (ie. Alt+S) where S is any letter that is used as a mnemonic in one of the top level menus. Since eclipse uses "S" as the mnemonic for Help > &Software Updates, "S" is sufficient. 2. Launch the plugin as part of Eclipse IDE 3. Press Alt+H to bring down the Help menu (to go along with our example in #1) BUG: Notice "Software Updates" is missing its mnemonic. More information: The code after "if (callback.isAcceleratorInUse(SWT.ALT | character))" inside Eclipse's MenuManager.java removes the mnemonic, but it seems like Eclipse should be checking "isAcceleratorInUse" only for top level menumanagers like File,Edit,...,Help, etc. : /* (non-Javadoc) * @see org.eclipse.jface.action.IContributionItem#update(java.lang.String) */ public void update(String property) { IContributionItem items[] = getItems(); for (int i = 0; i < items.length; i++) { items[i].update(property); } [...] } Any status on this bug? I'd consider any contributions for M6 (API) or M7 (non-API) [...] A 3.5 fix would be to make that behaviour optional in MenuManager with API and off by default early in 3.5, and to have the WorkbenchActionBuilder contributed MenuManagers and actionSets/editorActions contributed MenuManagers turn it on (if I can find MenuManagers in the correct place). I'd like us to work with the SWT team to make sure we understand what the correct platform behavior is, and make sure that we aren't getting in the way of that. The current behavior (i.e. turning off mnemonics) seems odd to me, in general. If we're going to fix this, we should fix it properly.
9
Build ID: M20070212-1330 Steps To Reproduce: 1. Create a plugin for eclipse that includes a key binding for "M1+S" (ie. Alt+S) where S is any letter that is used as a mnemonic in one of the top level menus. Since eclipse uses "S" as the mnemonic for Help > &Software Updates, "S" is sufficient. 2. Launch the plugin as part of Eclipse IDE 3. Press Alt+H to bring down the Help menu (to go along with our example in #1) BUG: Notice "Software Updates" is missing its mnemonic. More information: The code after "if (callback.isAcceleratorInUse(SWT.ALT | character))" inside Eclipse's MenuManager.java removes the mnemonic, but it seems like Eclipse should be checking "isAcceleratorInUse" only for top level menumanagers like File,Edit,...,Help, etc. : /* (non-Javadoc) * @see org.eclipse.jface.action.IContributionItem#update(java.lang.String) */ public void update(String property) { IContributionItem items[] = getItems(); for (int i = 0; i < items.length; i++) { items[i].update(property); } [...] } Any status on this bug? I'd consider any contributions for M6 (API) or M7 (non-API) [...] A 3.5 fix would be to make that behaviour optional in MenuManager with API and off by default early in 3.5, and to have the WorkbenchActionBuilder contributed MenuManagers and actionSets/editorActions contributed MenuManagers turn it on (if I can find MenuManagers in the correct place). I'd like us to work with the SWT team to make sure we understand what the correct platform behavior is, and make sure that we aren't getting in the way of that. The current behavior (i.e. turning off mnemonics) seems odd to me, in general. If we're going to fix this, we should fix it properly.
Challenge!
9
Spelling and Grammar Checkers
... are really good at finding “what’s not right“ in natural language text!
10
A 3.5 fix would be to make taht behaviour optional in MenuManager with API and off by default early in 3.5, and to have the WorkbenchActionBuilder contributed MenuManagers and actionSets/editorActions contributed MenuManagers turn it on (if I can find MenuManagers in the corect place).
11
A 3.5 fix would be to make taht behaviour optional in MenuManager with API and off by default early in 3.5, and to have the WorkbenchActionBuilder contributed MenuManagers and actionSets/editorActions contributed MenuManagers turn it on (if I can find MenuManagers in the corect place).
12
A 3.5 fix would be to make taht behaviour optional in MenuManager with API and off by default early in 3.5, and to have the WorkbenchActionBuilder contributed MenuManagers and actionSets/editorActions contributed MenuManagers turn it on (if I can find MenuManagers in the corect place).
Actual Spelling Mistakes!
13
Add Heuristics
14
Add Heuristics
H1: Camel CasecamelCase, CamelCase, CamelCASE, ...
14
Add Heuristics
H1: Camel CasecamelCase, CamelCase, CamelCASE, ...
H2: Programming Language Keywordsprintf, fork, fi, ...
14
Add Heuristics
H1: Camel CasecamelCase, CamelCase, CamelCASE, ...
H2: Programming Language Keywordsprintf, fork, fi, ...
H3: Special Characterstree();
14
Evaluation
Manually annotated 20 complete Bug Reports and Discussions from ECLIPSE.
Manually annotated 20 complete Email Discussions from POSTGRESQL developers Mailing List.
15
1
2
3
Annotation GUI
16
Precision / Recall
Precision(Si) =TPSi
TPSi+FPSi
Recall(Si) =TPSi
TPSi+FNSi
TP = We annotated and tool annotatedFP = Tool annotated, we did notFN = We annotated, tool did not
17
Results
Spellchecker Precision Recall
JOrtho 88.01% 64.31%
Jazzy 84.16% 68.30%
Hunspell 86.40% 68.34%
18
Results
Spellchecker Precision Recall
JOrtho 88.01% 64.31%
Jazzy 84.16% 68.30%
Hunspell 86.40% 68.34%
18
Results
Spellchecker Precision Recall
JOrtho 88.01% 64.31%
Jazzy 84.16% 68.30%
Hunspell 86.40% 68.34%
Hunspell used by OpenOffice and Mozilla Suite
18
ComparisonLine-wise classification of source code
1 Launch the plugin as part of Eclipse IDE 3. Press Alt+H to 2 bring down the Help menu (to go along with our example in #1)34 BUG: Notice "Software Updates" is missing its mnemonic.5 6 public void update(String property) { 7 IContributionItem items[] = getItems();8 for (int i = 0; i < items.length; i++) { 9 items[i].update(property); 10 } 11 }1213 Any status on this bug?
19
ComparisonLine-wise classification of source code
1 Launch the plugin as part of Eclipse IDE 3. Press Alt+H to 2 bring down the Help menu (to go along with our example in #1)34 BUG: Notice "Software Updates" is missing its mnemonic.5 6 public void update(String property) { 7 IContributionItem items[] = getItems();8 for (int i = 0; i < items.length; i++) { 9 items[i].update(property); 10 } 11 }1213 Any status on this bug?
20
Comparison
Precision Recall
Our approach 89.27% 86.46%
State-of-the-Art 66.13% 69.37%
Line-wise classification of source code
21
Comparison
Precision Recall
Our approach 89.27% 86.46%
State-of-the-Art 66.13% 69.37%
Line-wise classification of source code
21
Summary
22
Summary
22
Summary
22
Summary
22
Summary
22
mozilla
paulc
zhangchunlin
kbrosnan
sdwilsh
samuel.sidler+oldhasham8888
myles7897
deletesoftwareabillings
eddy_nigg
jmjeffery
sgautherie.bz
john.p.baker
l10n
adelfino
jo.hermans
jruderman
nightstalkerz
alice0775
hskupin
mmortal03
tchung
marcia
me.at.work
fittysix
steve.england
cbook
tonglebeak
ctalbert
VYV03354
ehsan
alex
nrthomas
aarobertxtr
smichaud shaver
johnjbartonmanujsabarwal
jdaggett
matt
bzbarsky
dtownsend
davemgarrett
info
stephen.donner
elmar.ludwig sdaugherty
mak77jdarmochwal
polidobj
vseerrortwalker
dietrich
mconnorbeltzner
steffen.wilberg
mano
highmind63
ria.klaassen
robert.bugzilla
edilee
kliu
faaborg
marco.zehesylvain.pasche bugzilla
rotisuliss
cl-bugs-new2
anselm.meyer
timwi
RainerStroebel
tomer
gavin.sharp
jbecerra
johnath
kev
martijn.martijn
cwwmozilla
longsonr
m-wada
zenikodveditz
matspal
philringnalda
zurtex
bomfog
cjcypoi02 corevette
masayukireed
phiw
timeless
matti
mh+mozilla
dao
klaas1988
sziadeh mark.finkle
23
mozilla
paulc
zhangchunlin
kbrosnan
sdwilsh
samuel.sidler+oldhasham8888
myles7897
deletesoftwareabillings
eddy_nigg
jmjeffery
sgautherie.bz
john.p.baker
l10n
adelfino
jo.hermans
jruderman
nightstalkerz
alice0775
hskupin
mmortal03
tchung
marcia
me.at.work
fittysix
steve.england
cbook
tonglebeak
ctalbert
VYV03354
ehsan
alex
nrthomas
aarobertxtr
smichaud shaver
johnjbartonmanujsabarwal
jdaggett
matt
bzbarsky
dtownsend
davemgarrett
info
stephen.donner
elmar.ludwig sdaugherty
mak77jdarmochwal
polidobj
vseerrortwalker
dietrich
mconnorbeltzner
steffen.wilberg
mano
highmind63
ria.klaassen
robert.bugzilla
edilee
kliu
faaborg
marco.zehesylvain.pasche bugzilla
rotisuliss
cl-bugs-new2
anselm.meyer
timwi
RainerStroebel
tomer
gavin.sharp
jbecerra
johnath
kev
martijn.martijn
cwwmozilla
longsonr
m-wada
zenikodveditz
matspal
philringnalda
zurtex
bomfog
cjcypoi02 corevette
masayukireed
phiw
timeless
matti
mh+mozilla
dao
klaas1988
sziadeh mark.finkle
UIJavaScript
Engine
XML Parser
Internet Explorer
23
24