Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will...
Transcript of Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will...
![Page 1: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/1.jpg)
Code Analysis via Version Control HistoryJustin Mclean Class Software
Email: [email protected] Twitter: @justinmclean Blog: http://blog.classsoftware.com
![Page 2: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/2.jpg)
Who am I?
• Programming for 25 years
• Developing and creating web applications for 15 years
• Apache Flex PMC, Incubator PMC, Apache member
• Release manager for Apache Flex, FlexUnit, Tour De Flex, Squiggly
• Run IoT meetup in Sydney Australia
![Page 3: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/3.jpg)
In the last 40 years we have written billions of lines of code that will keep programmers employed for trillions of man hours in the next few thousand years to clean up this mess we’ve made.
Joe Armstrong
The Mess We’re In
![Page 4: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/4.jpg)
Your Code as a Crime Scene
![Page 5: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/5.jpg)
You Write Code?
• 40-80% of all code is maintenance
• This is difficult and expensive
• More so with agile methodologies and on successful systems
• How to make this effective?
• Primary goal is to understand existing code
• All code is legacy code
![Page 6: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/6.jpg)
Detecting Issues With Code
• Code reviews
• Pair programming
• Unit Tests
• Continuous Integration
• Static code analysis
• Complexity metrics
![Page 7: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/7.jpg)
Scalability
• What about large code bases?
• How do you decide what to work on?
• How to work out where the bugs are?
• What bugs are important?
![Page 8: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/8.jpg)
Code Visualisation• Can you visually represent code to get a better
understanding of what’s going on?
• Code Cityhttp://www.inf.usi.ch/phd/wettel/codecity-wof.html
• Make a city of each class arranged by packages in city blocks
• Height is number of methods, colour coded by no of lines and area by no of terms
![Page 9: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/9.jpg)
Code City
![Page 10: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/10.jpg)
Code City Limitations
• Supports only a few common languages
• Shows hotspots (large buildings) but no real indication of where to possibly spend effort
• Existing hotspots may be stable
• Missing an important dimension
![Page 11: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/11.jpg)
Version Control History
• VCS contains a lot of useful information - that we mostly ignore. Information like:
• Who changed what lines when
• How much and how often things change
• Can aggregate information to tell us something useful?
![Page 12: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/12.jpg)
Why code Changes
• Fixing bugs
• Refactoring poor design
• Poor understanding of the problem
• Change frequency == proxy for effort
• Code that changes in the past is likely to change again in the future
![Page 13: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/13.jpg)
Effort + Complexity
• Change frequency / effort is not the whole story
• Config files changes frequently
• Overlaps in effort and complexity gives possible hotspots
![Page 14: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/14.jpg)
Code Hotspots
• Hotspots are code where changes will give the most benefit
• Frequent changes to complex code indicate poor quality code
• Lots of study in this area and surprisingly simple complex measures (change frequency) perform just as wells more complex measures
![Page 15: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/15.jpg)
What use are Hotspots?
• Take cognitive biases out of the equation
• Where you bugs are likely to be
• Prime areas for code reviews
• Prime areas for refactoring
• Targets for extra testing
![Page 16: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/16.jpg)
Code Maat
• Performs various analysis on version control history
• Produces simple csv text files
• Supports parsing git, svn, hg VCS
• Open source (GPL)
• https://github.com/adamtornhill/code-maat
![Page 17: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/17.jpg)
Producing HotSpot Data• Clone git repo
• Work out time period
• Generate git log summary data
• Look as summary
• Generate change frequencies
• Generate code complexity metrics
• Combine change frequency + complexity
![Page 18: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/18.jpg)
Apache Flex Project
• Large code baseNo files = 25000,LOC = 5 million or about 20 million including tests
• Mix of many file types and languages MXML, ActionScript, Java, XML files
• Two distinct phases - before and after donation
![Page 19: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/19.jpg)
statistic,valuenumber-of-commits,30090number-of-entities,4672number-of-entities-changed,34916number-of-authors,18
Adobe Flex SDK Summary
![Page 20: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/20.jpg)
statistic,valuenumber-of-commits,2911number-of-entities,51505number-of-entities-changed,81012number-of-authors,55
Apache Flex SDK Summary
![Page 21: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/21.jpg)
Complexity
• LOC is a terrible complexity measure, but turns out it’s just as bad as most others
• Fast and simple
• language agnostic
• CLOC
• https://github.com/AlDanial/cloc
![Page 22: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/22.jpg)
module,revisions,codebuild.xml,139,1481frameworks/build.xml,57,423mustella/jenkins.sh,48,167mustella/build.xml,44,1988ide/checkAllPlayerGlobals.sh,35,85frameworks/projects/mobiletheme/defaults.css,34,1568modules/downloads.xml,34,384installer.xml,29,821frameworks/projects/spark/build.xml,28,231frameworks/projects/textLayout/build.xml,28,218frameworks/projects/framework/src/mx/collections/ListCollectionView.as,28,1442
Apache Flex SDK Revisions
![Page 23: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/23.jpg)
Hot Spot Confirmation
• Build files
• Mobile theme
• ListCollections
• DataGrid and AdvancedDataGrid
• DateField
![Page 24: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/24.jpg)
Hot Spot Limitations
• Just numbers - may need to normalise
• Time period may be hard to get right (hotspots move)
• Impacted by individual commit styles
• May have false positives
• Just a guide - but still a very useful one
![Page 25: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/25.jpg)
Visualise Hot Spots• Hard to understand a large amount of
information
• Classes are nested in packages and we have complexity and change frequency / effort
• Circle packing works well. Circle size is LOC, colour by change frequency.
• D3.js easy to use / can display easily
• Need CSV -> JSON conversion
![Page 26: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/26.jpg)
Apache Flex SDK
![Page 27: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/27.jpg)
Apache Flex SDK
![Page 28: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/28.jpg)
Apache Flex SDK
![Page 29: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/29.jpg)
Apache Flex SDK
![Page 30: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/30.jpg)
Hotspot Analysis
• Hotspots are small proportion of all code
• Configuration files vs complex application logic
• Can have false positive - need to confirm
![Page 31: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/31.jpg)
Hotspot Analysis
• Experimental area
• 3rd party modules
• Compiler
• Data grids
• support classes
![Page 32: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/32.jpg)
Complexity (again)
• LOC OK but is there something better?
• Whitespace indentation!
• Easy to calculate / language independent
![Page 33: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/33.jpg)
DataGrid.as complexity• CLOC shows 50/50 split between code and
comments with 2800 lines of actual code.
• May be good idea to remove comments?
• DataGrid.as whitespacen,total,mean,sd,max 5860,9244,1.58,1.21,13
• Mean is low, sd is low, but max is way too high
• Real hotspot
![Page 34: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/34.jpg)
rev,n,total,mean,sdf52eb16,5608,8816,1.57,1.230c4290c,5609,8816,1.57,1.2363580a8,5644,8856,1.57,1.23fa2108b,5644,8856,1.57,1.23abc381b,5644,8856,1.57,1.23774cdd7,5791,9061,1.56,1.225e6e5c3,5791,9061,1.56,1.224388da8,5791,9061,1.56,1.22ec1ac28,5810,9090,1.56,1.2222b68de,5839,9124,1.56,1.22b1d0359,5855,9164,1.57,1.221bef097,5851,9158,1.57,1.223e752d9,5854,9165,1.57,1.226c53962,5857,9172,1.57,1.22c47f9f9,5855,9169,1.57,1.22bb600fd,5855,9169,1.57,1.2271f8757,5853,9230,1.58,1.213a1769b,5860,9247,1.58,1.218767c20,5860,9244,1.58,1.21
Complexity Trend
![Page 35: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/35.jpg)
Temporal Coupling
• File that need to change at the same time
• Causes:
• Copy paste duplicated code
• Inadequate encapsulation
• Anti-pattern sometimes referred to as shotgun surgery
![Page 36: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/36.jpg)
Detect Temporal Coupling• Results are a bit noisy - may need to filter
frameworks/projects/framework/src/mx/states/AddItems.as, frameworks/projects/spark/src/spark/components/Group.as, 92,7frameworks/projects/framework/src/mx/states/AddItems.as, frameworks/projects/mx/src/mx/core/Container.as, 83,6frameworks/projects/mx/src/mx/core/Container.as, frameworks/projects/spark/src/spark/components/SkinnableContainer.as, 83,6frameworks/projects/framework/src/mx/states/AddItems.as, frameworks/projects/spark/src/spark/components/SkinnableContainer.as, 83,6frameworks/projects/mx/src/mx/core/Container.as, frameworks/projects/spark/src/spark/components/Group.as, 76,7
![Page 37: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/37.jpg)
Just Words?
![Page 38: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/38.jpg)
Knowledge Map
• Generate ownership of files
• Multiple owners per file imply more potential bugs
• Knowledge maps - who know most about which files
![Page 39: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/39.jpg)
Apache Flex SDK
![Page 40: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/40.jpg)
Apache Flex SDK
![Page 41: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/41.jpg)
What we have learnt• Lot of useful information in your version
control history waiting to be found out
• Technique scales easily to (very) large code bases
• Keep data formats simple
• Simple measures of effort and complexity are often as good as complex ones
• Can find out areas in need of attention in your code base
![Page 42: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/42.jpg)
Links• Code as a crime scene
https://pragprog.com/book/atcrime/your-code-as-a-crime-scene
• Code City http://www.inf.usi.ch/phd/wettel/codecity.html
• CLOChttps://github.com/AlDanial/cloc
• Code Maat https://github.com/adamtornhill/code-maat
![Page 43: Code Analysis via Version Control History · Code Hotspots • Hotspots are code where changes will give the most benefit • Frequent changes to complex code indicate poor quality](https://reader035.fdocuments.net/reader035/viewer/2022062919/5ee37b74ad6a402d666d518a/html5/thumbnails/43.jpg)
Ask now, see me after the session,follow me on twitter @justinmcleanor email me at [email protected].
Slides can be found at conference site.
Questions?