Aspiring Minds | Automata
-
Upload
aspiring-minds -
Category
Technology
-
view
81 -
download
2
Transcript of Aspiring Minds | Automata
![Page 1: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/1.jpg)
Aspiring Minds
www.aspiringminds.com
Grading Programs using Machine Learning
Varun Aggarwal
Presented at KDD, 2014
![Page 2: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/2.jpg)
Programming Assessments: Existing solutions
• Manual evaluation: Can’t scale; not standardized
• Test-case based evaluation:
• High false-positives – hard code, inadvertent errors
• High false-negatives – correct code but not efficient
• Similarity metric between control flow graphs, syntax trees:
• Need to handle multiple correct implementations – theoretically doesn’t fit in
• No mapping of metric to an objective feedback
![Page 3: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/3.jpg)
Automatic grading of programs– Why?
• Widely performed - will help professors and TAs save a lot of time.
• Companies can recruit efficiently
• MOOCs - need automated open response assessments to really make it effective. True scaling of such system currently not achieved.
![Page 4: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/4.jpg)
A model to predict the logical correctness of a program, given the control and data
dependencies it possesses.
Our Approach
Automata – Automatic program evaluation engine
Machine Learning based scoring engine
Evaluation of programming best practices
Asymptotic complexity evaluation
Lint-styled rule-based system to detect programs not following programming best
practices.
Measures the run-time of the code for various input sizes
and empirically derives the complexity.
![Page 5: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/5.jpg)
Why programming modules give a better test-shortlist rate ?
• Programming has more predictive power in identifying good performers than Logical ability.
• Due to lower predictive power of Logical, a higher cut-off has to be applied to it as compared to Programming to get the same organizational efficiency.
• Higher the Programming capability of the person, requirement on Logical score is lesser.
• Given the person is lower than a given score on Programming, even having a higher logical ability does not help.
![Page 6: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/6.jpg)
Evaluation Rubric
ML based scoring
Understanding the human process
Problem and Language independent
Features
Machine learning model
Ungraded programs
Graded programs
Predicted grades
5
![Page 7: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/7.jpg)
Evaluation Rubric
Our Approach
Understanding the human process
Problem and Language independent
Features
Machine learning model
Ungraded programs
Graded programs
Predicted grades
1
![Page 8: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/8.jpg)
Evaluation Rubric
Our Approach
Understanding the human process
Problem and Language independent
Features
Machine learning model
Ungraded programs
Graded programs
Predicted grades
2
![Page 9: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/9.jpg)
Evaluation Rubric
Our Approach
Understanding the human process
Problem and Language independent
Features
Machine learning model
Ungraded programs
Graded programs
Predicted grades
3
![Page 10: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/10.jpg)
void print_1(int N){ for(i =1 ; i<=N; i++){ print newline;
count = i; for(j=0; j<i; j++) print count;
count++; }}
12 33 4 54 5 6 7
OBJECTIVE To print the pattern of integers
An implementation
1. Are there loops? Are there print statements?
3. Is the conditional in the inner loop dependent on - a variable modified in the outer loop? - a variable used in the conditional of the outer
loop?
What does a grader look for?
2. Is there a nested-loop structure?
![Page 11: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/11.jpg)
Grammar for expressing features• Simple features
• Keywords and Tokens (Counts):
• Tokens like for, if, return, break; function calls like printf, strrev, strcat; declarations like int, char• Operators like various arithmetic, logical, relational operators used• Character constants like ‘\0’, ‘ ’, ‘65’, ‘96’
• Capturing logical constructs (Interactions)
• Control flow structure
• Data-dependencies
• Data-dependencies in context of control-flow
![Page 12: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/12.jpg)
CONTROL FEATURES – COUNTS
Counts of control-related keywords/tokens Ex. count(for) = 2
count(for-in-for) = 1 count(while) = 0
Control-context of these keywords- The Print command as loop(loop(print)))
for(i =1 ; i<=N; i++){
print newline;
count = i;
CONTROL FLOW GRAPH
i = 1
i <= N
i++
j < i
count = ij = 0
print(count)count++
j++
END
Loop 1
Loop 2
Parent scope
for(j=0; j<i; j++)
print count; count++;
void print(int N){
}
}
TARGET PROGRAM
![Page 13: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/13.jpg)
DATA OPERATION FEATURES IN CONTROL-CONTEXT
Counts of data-related tokens in context of the control structureEx. count(block1 :loop(loop(++))) = 2
count(block1 :loop(loop_cond(<))) = 1
Capture control-context of data-dependencies in groups of expressions
i++ j < i : var (i) related to var (j) : appearing in a loop(loop_cond) previously incremented : appearing in a loop The relation and the increment happen in the same block
Loop 1
Loop 2
Loop 1
Loop 1
Loop 2
Loop 1
Loop 2
Loop 1
i = 1
i <= N
print(count)
count = ii++
j < i
count = 0
count++
j= 0
j++
Parent scope Parent scope
Loop 1
Loop 1
Loop 2
CONTROL FLOW INFORMATION ANNOTATED IN A-D-D GRAPH
Loop 1
![Page 14: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/14.jpg)
• Deployed Automata in a major product-based company’s recruitment
• Analyzed the performance improvement in using Automata over test-case pass based selection criterion
• 22.6% candidates who were not being shortlisted through test-case pass were now shortlisted using Automata.
Case study
![Page 15: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/15.jpg)
Experimental Results
Sort Problem
![Page 16: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/16.jpg)
Doing it the one-class way!
PROBLEM All features Basic featuresMean Min25 Mean Min25
1 0.57 0.61 0.52 0.562 0.80 0.83 0.72 0.75
3 0.75 0.81 0.59 0.734 0.81 0.81 0.75 0.755 0.68 0.69 0.55 0.61
Betters test-case in all, but one
![Page 17: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/17.jpg)
How good is the final ML-based score?
Validation Correlation >= 0.79Matches Inter-rater Correlation between two human raters
PROBLEM # of features Cross-val correl Train correl Validation correl Test Case Score
1 80 0.61 0.85 0.79 0.54
2 68 0.77 0.93 0.91 0.80
3 193 0.91 0.98 0.90 0.64
4 66 0.90 0.94 0.90 0.80
5 87 0.81 0.92 0.84 0.84
![Page 18: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/18.jpg)
Can we get insight? • The most contributing feature for Find Digit problem -
int findDigit(int N, int digit){
…
…
LOOP (N != <constant value>){
…
N = N / <constant value>
…
}
}
Features for FindDigit problem analyzed. Given a multi-digit number and a digit, one has to find the number of times the digit appears in the number
![Page 19: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/19.jpg)
Yes, we can! • The most contributing feature for Find Digit problem -
int findDigit(int N, int digit){
…
LOOP (N != <constant value>){
…
N = N / <constant value>
…
}
…
}
int findDigit(int N, int digit){
...
while(N != 0){
d = N%10;
if(d == digit)
...
N = N / 10;
}
}
![Page 20: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/20.jpg)
Evaluation Rubric
Score Interpretation5 Completely correct and efficient
An efficient implementation of the problem using right control structures and data-dependencies.
4 Correct with some silly errorsCorrect control structures and closely matching data-dependencies. Some silly mistakes fail the code to pass test-cases.
3 Inconsistent logical structuresRight control structures start exist with few correct data dependencies
2 Emerging basic structuresAppropriate keywords and tokens present, showing some understanding of the problem
1 Gibberish codeSeemingly unrelated to problem at hand.
![Page 21: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/21.jpg)
Automata – Sample report
Candidate’s source codeFeedback on programming best practices
Asymptotic complexity of the candidate’s solution
Test case pass/fail information
Problem summary
![Page 22: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/22.jpg)
Do our fancy features help?
Control and Data dependency features add around 0.15 correlation points above token information.
PROBLEM Type of feature # of features Cross-val correl Train correl Validation correl
1All, w/o test case 35 0.57 0.72 0.56
Basic 60 0.62 0.87 0.41
2All, w/o test case 80 0.81 0.99 0.80
Basic 26 0.59 0.72 0.67
3All, w/o test case 190 0.87 0.97 0.90
Basic 26 0.74 0.89 0.74
4All, w/o test case 134 0.85 0.91 0.82
Basic 35 0.83 0.88 0.69
5All, w/o test case 166 0.66 0.81 0.64
Basic 40 0.61 0.78 0.61
![Page 23: Aspiring Minds | Automata](https://reader035.fdocuments.net/reader035/viewer/2022062223/588394d61a28ab2b568b4ae7/html5/thumbnails/23.jpg)
Conclusion
• We propose the first machine learning based approach to automatically grade programs
• An innovative feature grammar is proposed which matches human intuition of grading programs.
• Models built for sample problems show promising results.
• We propose and demonstrate machine learning techniques to lower the need of human-graded data to build models.