Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2:...
Transcript of Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2:...
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Exploring Regular Expression Comprehension
Carl Chapman*, Peipei Wang, Kathryn T. Stolee
Sandia National Laboratories Albuquerque*, North Carolina State University
[email protected], [email protected], [email protected]
Nov 1st, 2017
1 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Why should we use regular expressions?
A succinct way to express pattern matching.
Less code and flexible.
2 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Why should we NOT use regular expressions?
Hard to write the correct regular expression.
Complicated to understand.
Difficult to test and debug.
3 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Example of Bad Regex
Regexˆ[\s\u200c]+|[\s\u200c]+$
4 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Example of Bad Regex
Regexˆ[\s\u200c]+|[\s\u200c]+$
4 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
State of the Art
Tools for visual debugging (e.g., Regex101,Regexr)Tools for graphical regular expression (e.g., Rex,Brics)Tools for automatic generation of regex andstrings(e.g., Rex, ReLIE)
5 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Running Example
Which regular expression should we use?
A = [1-9][0-9]{0,2}B = [1-9][0-9]?[0-9]?C = [1-9]|[1-9][0-9]|[1-9][0-9][0-9]
Difference: How to express Double-Bounded repeti-tion of digits?
A: repetition bounds using {}B: digits can appear or not appear using ?C: explicit repetitions using OR
6 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Running Example
Which regular expression should we use?
A = [1-9][0-9]{0,2}B = [1-9][0-9]?[0-9]?C = [1-9]|[1-9][0-9]|[1-9][0-9][0-9]
Difference: How to express Double-Bounded repeti-tion of digits?
A: repetition bounds using {}B: digits can appear or not appear using ?C: explicit repetitions using OR
6 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Running Example
Which regular expression should we use?
A = [1-9][0-9]{0,2}B = [1-9][0-9]?[0-9]?C = [1-9]|[1-9][0-9]|[1-9][0-9][0-9]
Difference: How to express Double-Bounded repeti-tion of digits?
A: repetition bounds using {}B: digits can appear or not appear using ?C: explicit repetitions using OR
6 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Regex Representation
Regex representation: syntactic expression
matching a digit (Custom Character Class):[0123456789], (0|1|2|3|4|5|6|7|8|9), [0-9], [\u30-\u39],
\d, . . .
matching at least one digit (Lower-Bounded):[0-9]+, [0-9][0-9]*, [0-9]{1,}, [0-9][0-9]{0,}, \d+, . . .
matching at most three digits and at least onedigit (Double-Bounded): [1-9][0-9]{0,2},
[1-9][0-9]?[0-9]?, [1-9]|[1-9][0-9]|[1-9][0-9][0-9],
[1-9]\d{0,2}, . . .
7 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Regex Representation
Regex representation: syntactic expression
matching a digit (Custom Character Class):[0123456789], (0|1|2|3|4|5|6|7|8|9), [0-9], [\u30-\u39],
\d, . . .
matching at least one digit (Lower-Bounded):[0-9]+, [0-9][0-9]*, [0-9]{1,}, [0-9][0-9]{0,}, \d+, . . .
matching at most three digits and at least onedigit (Double-Bounded): [1-9][0-9]{0,2},
[1-9][0-9]?[0-9]?, [1-9]|[1-9][0-9]|[1-9][0-9][0-9],
[1-9]\d{0,2}, . . .
7 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Regex Representation
Regex representation: syntactic expression
matching a digit (Custom Character Class):[0123456789], (0|1|2|3|4|5|6|7|8|9), [0-9], [\u30-\u39],
\d, . . .
matching at least one digit (Lower-Bounded):[0-9]+, [0-9][0-9]*, [0-9]{1,}, [0-9][0-9]{0,}, \d+, . . .
matching at most three digits and at least onedigit (Double-Bounded): [1-9][0-9]{0,2},
[1-9][0-9]?[0-9]?, [1-9]|[1-9][0-9]|[1-9][0-9][0-9],
[1-9]\d{0,2}, . . .
7 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Regex Representation
Regex representation: syntactic expression
matching a digit (Custom Character Class):[0123456789], (0|1|2|3|4|5|6|7|8|9), [0-9], [\u30-\u39],
\d, . . .
matching at least one digit (Lower-Bounded):[0-9]+, [0-9][0-9]*, [0-9]{1,}, [0-9][0-9]{0,}, \d+, . . .
matching at most three digits and at least onedigit (Double-Bounded): [1-9][0-9]{0,2},
[1-9][0-9]?[0-9]?, [1-9]|[1-9][0-9]|[1-9][0-9][0-9],
[1-9]\d{0,2}, . . .
7 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Research Goals
Explore regex comprehension
1 Which regex representations are mostunderstandable? (understandability study)
2 Which regex representations are used mostfrequently? (community study)
3 Which regex representations should we use?(desirability analysis)
8 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Regex Comparison Prerequisite
Equivalence class: a group of behaviorallyequivalent regexes
Match the same set of character stringsDifferent regex representationsEquivalent DFAs (Deterministic FiniteAutomaton)
9 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Regex Comparison Prerequisite
Equivalence class: a group of behaviorallyequivalent regexes
Match the same set of character strings
Different regex representationsEquivalent DFAs (Deterministic FiniteAutomaton)
9 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Regex Comparison Prerequisite
Equivalence class: a group of behaviorallyequivalent regexes
Match the same set of character stringsDifferent regex representations
Equivalent DFAs (Deterministic FiniteAutomaton)
9 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Regex Comparison Prerequisite
Equivalence class: a group of behaviorallyequivalent regexes
Match the same set of character stringsDifferent regex representationsEquivalent DFAs (Deterministic FiniteAutomaton)
9 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Double-Bounded Group of Equivalence Classes
D2
[1-9][0-9]{0,2}
D1
[1-9]|[1-9][0-9]|[1-9][0-9][0-9]
D3
0-90-91-9
[1-9][0-9]?[0-9]?
DBB GROUP
10 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Five Equivalence Classes & 18 RegexRepresentations
LWB GROUP(using the abstract `A{2,}’where A is any pattern)
AA+
L3
AAA*
L2
A{2,}
L1
CCC GROUP(using the concrete example of `[0-9a]’ and assuming an ASCII charset)
LIT GROUP(using the concrete example `\a\$>’
and assuming an ASCII charset )
\a\$>
T1
\007\036\062
T4
\x07\x24\x3E
T2
\a[$]>
T3
[0-9a]
C1
(0|1|2|3|4|5|6|7|8|9|a)([0-9]|a)(\d|a)
C5
[\da]
C4
[0123456789a]
C2
[^\x00-/:-`b-\177]
C3
DBB GROUP(using the abstract `pB{1,3}s’ where B is any pattern),
p and s are any (possibly empty) pre!x, su"x
pBB?B?s
D2
pB{1,3}s
D1
pBs|pBBs|pBBBs
D3
SNG GROUP(using the abstract `S{3}’ where S is any pattern)
SSS
S2
S{3,3}
S3
S{3}
S1
S SSB
s
s
B
Bp
s
A A
A
a
0-9
a $ >
11 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Understandability Study
RQ1Which representations are most understandable?
180 Amazon‘s Mechanical Turk (MTurk)participants60 regular expressions26 equivalence groups (18 of two members, 8 ofthree members)41 pairs of equivalent regexes
12 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Understandability Study
RQ1Which representations are most understandable?
180 Amazon‘s Mechanical Turk (MTurk)participants60 regular expressions26 equivalence groups (18 of two members, 8 ofthree members)41 pairs of equivalent regexes
12 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Study Example
13 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Comprehension Metrics
1 Matching2 Composition
14 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Comprehension Metrics
1 Matching2 Composition
String ‘RR*’ Oracle P1 P2 P3 P41
“ARROW”
2
“qRs” 5 5 ?
3
“R0R” ? -
4
“qrs” 5 5 -
5
“98” 5 5 5 5 -Score 1.00 0.80 0.80 0.50 1.00
= match, 5= not a match, ? = unsure, – = left blank
14 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Comprehension Metrics
1 Matching2 Composition
String ‘RR*’ Oracle P1 P2 P3 P41 “ARROW”2 “qRs”
5 5 ?
3 “R0R”
? -
4 “qrs” 5
5 -
5 “98” 5
5 5 5 -
Score 1.00
0.80 0.80 0.50 1.00
= match, 5= not a match, ? = unsure, – = left blank
14 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Comprehension Metrics
1 Matching2 Composition
String ‘RR*’ Oracle P1 P2 P3 P41 “ARROW”2 “qRs”
5 5 ?
3 “R0R”
? -
4 “qrs” 5
5 -
5 “98” 5 5
5 5 -
Score 1.00
0.80 0.80 0.50 1.00
= match, 5= not a match, ? = unsure, – = left blank
14 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Comprehension Metrics
1 Matching2 Composition
String ‘RR*’ Oracle P1 P2 P3 P41 “ARROW”2 “qRs”
5 5 ?
3 “R0R”
? -
4 “qrs” 5
5 -
5 “98” 5 5
5 5 -
Score 1.00 0.80
0.80 0.50 1.00
= match, 5= not a match, ? = unsure, – = left blank
14 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Comprehension Metrics
1 Matching2 Composition
String ‘RR*’ Oracle P1 P2 P3 P41 “ARROW”2 “qRs” 5
5 ?
3 “R0R”
? -
4 “qrs” 5 5
-
5 “98” 5 5 5
5 -
Score 1.00 0.80 0.80
0.50 1.00
= match, 5= not a match, ? = unsure, – = left blank
14 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Comprehension Metrics
1 Matching2 Composition
String ‘RR*’ Oracle P1 P2 P3 P41 “ARROW”2 “qRs” 5 5
?
3 “R0R” ?
-
4 “qrs” 5 5
-
5 “98” 5 5 5 5
-
Score 1.00 0.80 0.80
0.50 1.00
= match, 5= not a match, ? = unsure, – = left blank
14 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Comprehension Metrics
1 Matching2 Composition
String ‘RR*’ Oracle P1 P2 P3 P41 “ARROW”2 “qRs” 5 5
?
3 “R0R” ?
-
4 “qrs” 5 5
-
5 “98” 5 5 5 5
-
Score 1.00 0.80 0.80 0.50
1.00
= match, 5= not a match, ? = unsure, – = left blank
14 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Comprehension Metrics
1 Matching2 Composition
String ‘RR*’ Oracle P1 P2 P3 P41 “ARROW”2 “qRs” 5 5 ?3 “R0R” ? -4 “qrs” 5 5 -5 “98” 5 5 5 5 -
Score 1.00 0.80 0.80 0.50 1.00= match, 5= not a match, ? = unsure, – = left blank
14 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Comprehension Metrics
1 Matching2 Composition
Regex Composition scoreP1 (q4fab|ab)
xyzq4fab 1
P2 (q4fab|ab)
acb 0
14 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Comprehension Metrics
1 Matching2 Composition
Regex Composition scoreP1 (q4fab|ab) xyzq4fab
1
P2 (q4fab|ab)
acb 0
14 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Comprehension Metrics
1 Matching2 Composition
Regex Composition scoreP1 (q4fab|ab) xyzq4fab 1P2 (q4fab|ab)
acb 0
14 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Comprehension Metrics
1 Matching2 Composition
Regex Composition scoreP1 (q4fab|ab) xyzq4fab 1P2 (q4fab|ab) acb
0
14 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Comprehension Metrics
1 Matching2 Composition
Regex Composition scoreP1 (q4fab|ab) xyzq4fab 1P2 (q4fab|ab) acb 0
14 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Which representations are most understandable?
Double-Bounded Groups
((q4f)?ab)
((q4f){0,1}ab)
(q4fab|ab) (deedo(do)?)
(dee(do){1,2})
(deedo|deedodo)
15 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Which representations are most understandable?
Double-Bounded Groups
((q4f)?ab)
((q4f){0,1}ab)
(q4fab|ab)
(deedo(do)?)
(dee(do){1,2})
(deedo|deedodo)
15 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Which representations are most understandable?
Double-Bounded Groups
((q4f)?ab)
((q4f){0,1}ab)
(q4fab|ab) (deedo(do)?)
(dee(do){1,2})
(deedo|deedodo)
15 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Which representations are most understandable?
Regex Match Comp((q4f){0,1}ab) 82.93 50.00((q4f)?ab) 79.25 40.00(q4fab|ab) 84.50 60.00
((q4f)?ab)
((q4f){0,1}ab)
(q4fab|ab)
Regex Match Comp(dee(do){1,2} 84.83 66.67(deedo(do)?) 77.17 60.00(deedo|deedodo) 90.00 63.33
(deedo(do)?)
(dee(do){1,2})
(deedo|deedodo)
16 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Which representations are most understandable?
Regex Match Comp((q4f){0,1}ab) 82.93 50.00((q4f)?ab) 79.25 40.00(q4fab|ab) 84.50 60.00
((q4f)?ab)
((q4f){0,1}ab)
(q4fab|ab)
Regex Match Comp(dee(do){1,2} 84.83 66.67(deedo(do)?) 77.17 60.00(deedo|deedodo) 90.00 63.33
(deedo(do)?)
(dee(do){1,2})
(deedo|deedodo)
16 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Which representations are most understandable?
Regex Match Comp((q4f){0,1}ab) 82.93 50.00((q4f)?ab) 79.25 40.00(q4fab|ab) 84.50 60.00
((q4f)?ab)
((q4f){0,1}ab)
(q4fab|ab)
Regex Match Comp(dee(do){1,2} 84.83 66.67(deedo(do)?) 77.17 60.00(deedo|deedodo) 90.00 63.33
(deedo(do)?)
(dee(do){1,2})
(deedo|deedodo)
16 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Which representations are most understandable?
Regex Match Comp((q4f){0,1}ab) 82.93 50.00((q4f)?ab) 79.25 40.00(q4fab|ab) 84.50 60.00
((q4f)?ab)
((q4f){0,1}ab)
(q4fab|ab)
Regex Match Comp(dee(do){1,2} 84.83 66.67(deedo(do)?) 77.17 60.00(deedo|deedodo) 90.00 63.33
(deedo(do)?)
(dee(do){1,2})
(deedo|deedodo)
16 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Which representations are most understandable?
Regex Match Comp((q4f){0,1}ab) 82.93 50.00((q4f)?ab) 79.25 40.00(q4fab|ab) 84.50 60.00
((q4f)?ab)
((q4f){0,1}ab)
(q4fab|ab)
Regex Match Comp(dee(do){1,2} 84.83 66.67(deedo(do)?) 77.17 60.00(deedo|deedodo) 90.00 63.33
(deedo(do)?)
(dee(do){1,2})
(deedo|deedodo)
16 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Which representations are most understandable?
Regex Match Comp((q4f){0,1}ab) 82.93 50.00((q4f)?ab) 79.25 40.00(q4fab|ab) 84.50 60.00
((q4f)?ab)
((q4f){0,1}ab)
(q4fab|ab)
Regex Match Comp(dee(do){1,2} 84.83 66.67(deedo(do)?) 77.17 60.00(deedo|deedodo) 90.00 63.33
(deedo(do)?)
(dee(do){1,2})
(deedo|deedodo)
16 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Topological Ordering
((q4f)?ab)
((q4f){0,1}ab)
(q4fab|ab)
D2
D1
D3
(deedo(do)?)
(dee(do){1,2})
(deedo|deedodo)
D2
D1
D3
Understandability OrderingD3 > D1 > D2
17 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Topological Ordering
((q4f)?ab)
((q4f){0,1}ab)
(q4fab|ab)
D2
D1
D3 (deedo(do)?)
(dee(do){1,2})
(deedo|deedodo)
D2
D1
D3
Understandability OrderingD3 > D1 > D2
17 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Topological Ordering
((q4f)?ab)
((q4f){0,1}ab)
(q4fab|ab)
D2
D1
D3
(deedo(do)?)
(dee(do){1,2})
(deedo|deedodo)
D2
D1
D3
Understandability OrderingD3 > D1 > D2
17 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Topological Ordering
((q4f)?ab)
((q4f){0,1}ab)
(q4fab|ab)
D2
D1
D3
(deedo(do)?)
(dee(do){1,2})
(deedo|deedodo)
D2
D1
D3
Understandability OrderingD3 > D1 > D2
17 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Community Study
RQ2Which representations have the strongest communitysupport based on frequency?
13,597 distinct regex patterns from 1,544Github Python projectsMapping regexes to representations: PCREfeature, string pattern, token stream
18 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Community Study
RQ2Which representations have the strongest communitysupport based on frequency?
13,597 distinct regex patterns from 1,544Github Python projectsMapping regexes to representations: PCREfeature, string pattern, token stream
18 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Frequent Representations
Rep Example nPatterns % patterns nProjects % projectsD1 ((q4f){0,1}ab) 346 2.5% 234 15.2%D2 ((q4f)?ab) 1,871 13.8% 646 41.8%D3 (q4fab|ab) 10 .1% 27 1.7%
D2
D1
D3
Community OrderingD2 > D1 > D3
19 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Frequent Representations
Rep Example nPatterns % patterns nProjects % projectsD1 ((q4f){0,1}ab) 346 2.5% 234 15.2%D2 ((q4f)?ab) 1,871 13.8% 646 41.8%D3 (q4fab|ab) 10 .1% 27 1.7%
D2
D1
D3
Community OrderingD2 > D1 > D3
19 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Frequent Representations
Rep Example nPatterns % patterns nProjects % projectsD1 ((q4f){0,1}ab) 346 2.5% 234 15.2%D2 ((q4f)?ab) 1,871 13.8% 646 41.8%D3 (q4fab|ab) 10 .1% 27 1.7%
D2
D1
D3
Community OrderingD2 > D1 > D3
19 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Frequent Representations
Rep Example nPatterns % patterns nProjects % projectsD1 ((q4f){0,1}ab) 346 2.5% 234 15.2%D2 ((q4f)?ab) 1,871 13.8% 646 41.8%D3 (q4fab|ab) 10 .1% 27 1.7%
D2
D1
D3
Community OrderingD2 > D1 > D3
19 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Desirability Analysis
RQ3Which regex representations should we use?
A = [1-9][0-9]{0,2}B = [1-9][0-9]?[0-9]?C = [1-9]|[1-9][0-9]|[1-9][0-9][0-9]
B
A
CD2B
D1A
D3C
Topological OrderingUnderstandability: D3 > D1 > D2
Community: D2 > D1 > D3
20 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Desirability Analysis
RQ3Which regex representations should we use?
A = [1-9][0-9]{0,2}B = [1-9][0-9]?[0-9]?C = [1-9]|[1-9][0-9]|[1-9][0-9][0-9]
B
A
C
D2B
D1A
D3C
Topological OrderingUnderstandability: D3 > D1 > D2
Community: D2 > D1 > D3
20 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Desirability Analysis
RQ3Which regex representations should we use?
A = [1-9][0-9]{0,2}B = [1-9][0-9]?[0-9]?C = [1-9]|[1-9][0-9]|[1-9][0-9][0-9]
B
A
C
D2B
D1A
D3C
Topological OrderingUnderstandability: D3 > D1 > D2
Community: D2 > D1 > D3
20 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Desirability Analysis
RQ3Which regex representations should we use?
A = [1-9][0-9]{0,2}B = [1-9][0-9]?[0-9]?C = [1-9]|[1-9][0-9]|[1-9][0-9][0-9]
B
A
C
D2B
D1A
D3C
Topological OrderingUnderstandability: D3 > D1 > D2
Community: D2 > D1 > D3
20 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Desirability Analysis
RQ3Which regex representations should we use?
A = [1-9][0-9]{0,2}B = [1-9][0-9]?[0-9]?C = [1-9]|[1-9][0-9]|[1-9][0-9][0-9]
B
A
C
D2B
D1A
D3C
Topological OrderingUnderstandability: D3 > D1 > D2
Community: D2 > D1 > D3
20 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Desirability Analysis
RQ3Which regex representations should we use?
A = [1-9][0-9]{0,2}B = [1-9][0-9]?[0-9]?C = [1-9]|[1-9][0-9]|[1-9][0-9][0-9]
B
A
C
D2B
D1A
D3C
Topological OrderingUnderstandability: D3 > D1 > D2Community: D2 > D1 > D3
20 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Desirability Analysis
RQ3Which regex representations should we use?
A = [1-9][0-9]{0,2}B = [1-9][0-9]?[0-9]?C = [1-9]|[1-9][0-9]|[1-9][0-9][0-9]
B
A
C
D2B
D1A
D3C
Topological OrderingUnderstandability: D3 > D1 > D2Community: D2 > D1 > D3
20 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Ordering Results
Equivalence Class Understandability CommunityCustom Character Class C1 C5 C3 C4 C2 C1 C3 C2 C4 C5Double-Bounded D3 D1 D2 D2 D1 D3Lower-Bounded L3 L2 L3 L2 L1Single-Bounded S2 S1 S2 S1 S3Literal T1 T3 T2 T4 T1 T3 T2 T4
21 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
What We Learn
1 Commonly used regexes are NOT always easierto understand!
2 Replace * with + when possible.3 Use literal character! If not possible, use hex
encoding.4 Use range feature for character sets when
possible.letters a to g: [a-g], [abcdefg], [a|b|c|d|e|f|g]
22 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Limitations
Five types of equivalence classesPython codeRegex length is short
ab|ababthisbadchoice|thisbadchoicethisbadchoice
DFA size is small: 2 to 8. . .
23 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Post Analysis
ANOVA analysis: which factor can impactcomprehension?
Regex representationDFA size (matching: *α = 0.05, composition: **α = 0.01)
Regex length
24 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Opportunities for Future Work!
DFA sizeHow does DFA size impact comprehension?
More types of equivalence classesConsider multiline option, case insensitive, backrefer-ence?
Automatic identificationCould we automatically build equivalence classes?
25 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Opportunities for Future Work!
DFA sizeHow does DFA size impact comprehension?
More types of equivalence classesConsider multiline option, case insensitive, backrefer-ence?
Automatic identificationCould we automatically build equivalence classes?
25 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Opportunities for Future Work!
DFA sizeHow does DFA size impact comprehension?
More types of equivalence classesConsider multiline option, case insensitive, backrefer-ence?
Automatic identificationCould we automatically build equivalence classes?
25 / 26
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion
Questions?
Peipei [email protected]
North Carolina State University
26 / 26