2020AppSec California – Santa Monica, CA
Choosing the right static code analyzers based on
hard data
This material is based on research sponsored by the Department of Homeland Security (DHS) Science and Technology Directorate, Cyber Security Division (DHS S&T/CSD) via contract number HHSP233201600062C. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Department of Homeland Security. 24 Jan 2020
Choosing the right static code analyzers based on hard data
About the speaker
2
Secure Decisions§ Cyber R&D, focusing on application security§ Primarily serving DHS and DoD,
some commercial companies
Chris HornSenior ResearcherSecure Decisions
2
GrammaTech, Inc.§ Prime contractor to DHS Science & Technology for STAMP
Choosing the right static code analyzers based on hard data
Outline of today’s talk
Introduction§ Using static analysis is a good idea
§ Knowing which analyzer to use isn’t easy
Comparing static software analyzers§ Seven (7) categories of capabilities
§ How we collect information for Kompar
Kompar system§ Progress
§ Plans
PART I
PART II
PART III
3
Choosing the right static code analyzers based on hard data
IntroductionStatic software analysis is a good idea, butknowing which analyzer to use isn’t easy
4
Choosing the right static code analyzers based on hard data
Static analysis is a way to examine softwarewithout running / executing it§ Typically analyzes source code
§ Some analyzers work on compiled binaries
§ Goal is to find quality issues
Analyzers for most languages / formats§ Open source
§ Commercial / proprietary
What is this static analysis?
ALSO KNOWN AS
Static code analysisStatic program analysisSource code analysis
5
Choosing the right static code analyzers based on hard data
Like help from an expert
Photo: https://en.wikipedia.org/wiki/Pair_programming#/media/File:Pair_programming_1.jpg
6
Choosing the right static code analyzers based on hard data
What types of issues can static analysis find?
Finds reliability, security, performance, and maintainability issues§ Capabilities depend on the analysis technology
§ Better at finding implementation issues than design issues
– Buffer handling
– Code quality
– Control flow management
– Encryption and randomness
– Error handling
– File handling
– Formatting
Willis, Chuck, and Kris Britton. “Sticking to the Facts.” presented at the Black Hat 2011, Las Vegas, Nevada, August 4, 2011.https://media.blackhat.com/bh-us-11/Willis/BH_US_11_WillisBritton_Analyzing_Static_Analysis_Tools_Slides.pdf.US-CERT https://www.us-cert.gov/bsi/articles/tools/source-code-analysis/source-code-analysis-tools---overview
− Hardcoded secrets− Information leaks− Initialization and shutdown− Injection− Number handling− Pointer & reference handling
7
Choosing the right static code analyzers based on hard data
Using analyzers improves code quality & security
“…automated static analysis is an economical complement to other
verification and validation techniques.”
“We have built a successful static analysis infrastructure at Google that prevents
hundreds of bugs per day from entering the Google codebase”
Nortel https://ieeexplore.ieee.org/document/1628970Google https://cacm.acm.org/magazines/2018/4/226371-lessons-from-building-static-analysis-tools-at-google/fulltextCoverity https://cacm.acm.org/magazines/2010/2/69354-a-few-billion-lines-of-code-later/fulltextFacebook https://research.fb.com/publications/moving-fast-with-software-verification/
“If you can find code, and the checked system is big enough, and you can compile
(enough of) it, then you will always find serious errors.”
“INFER is deployed and [run against] every code modification in Facebook’s mobile
apps”
8
Choosing the right static code analyzers based on hard data
Sign me up, you say
Java application12,000 LOCMaven buildIntelliJ IDE
Which analyzers can find SQL injection bugs?
9
Choosing the right static code analyzers based on hard data 10
There is no reliable source of informationabout software analyzers
Choosing the right static code analyzers based on hard data
Our vision
11
for software analyzersor
Choosing the right static code analyzers based on hard data
Drive adoption through education§ Benefits of software analysis§ Catalog available analyzers§ Relative strengths & weaknesses§ Set realistic expectations
Improve market transparency§ Straightforward comparison§ Standardized, comprehensive
rating methodology§ Informed consumers create
pressure on proprietary tool makers to disclose performance
Build Kompar into a source of information about software analyzers, beginning with static tools
12
Choosing the right static code analyzers based on hard data
More secure & reliable software
Kompar will benefit multiple stakeholders
Consumers / adopters of analyzers§ Assurance & security analysts, developers
§ What analyzers are available?
§ How should I evaluate analyzers?
§ Which analyzers will best meet my needs?
Analyzer makers / vendors§ How can I drive adoption of my analyzer?
Analyzer researchers§ NIST, MITRE, NSA, academia
§ Where are today’s analyzers weak?
§ Who is interested in my research?
13
Choosing the right static code analyzers based on hard data
Comparing static software analyzersThere’s more to analyzers than just finding defects
14
Choosing the right static code analyzers based on hard data
1. Basic information§ Software license§ On-premise vs. cloud hosting§ Last release date
2. Process integration§ Analyzer location§ Software inputs§ Viewing & managing output
3. Coverage§ Prog. language & framework§ Weakness types
4. Speed & scalability§ Scan time duration§ Scannable codebase size§ Number of codebases
5. Results quality§ Recall, precision, etc.§ Usability of warnings
6. Reporting
7. Support
Seven categories of analyzer properties
15
Choosing the right static code analyzers based on hard data
1. Basic information
Common, foundational information about an analysis tool
16
Choosing the right static code analyzers based on hard data
Is the tool mature & up-to-date?§ Tool first release date § Version release date
Is it run in-house, or off-prem?§ Self-hosted or SaaS§ Which operating system will it
run on?
Do I have to pay for it?
How is it licensed?
Where can I go to learn more?§ Tool website
1. Basic information
17
Choosing the right static code analyzers based on hard data
2. Process integration
Capabilities supporting different ways of using an analyzer in my development processes
18
Choosing the right static code analyzers based on hard data
2. Process integration
“Our most important insight is that careful developer workflow integration is key for static analysis tool adoption.”
“…desirable that the programmer does not have to do anything else than his/her normal job, they should see analysis results as part of their normal workflow ratherthan requiring them to switch to a different tool.”
Google https://cacm.acm.org/magazines/2018/4/226371-lessons-from-building-static-analysis-tools-at-google/fulltextFacebook https://research.fb.com/publications/moving-fast-with-software-verification/
19
Choosing the right static code analyzers based on hard data
Process integration
20
Choosing the right static code analyzers based on hard data
How will analyzer fit in my dev. environment?
21
ManagersIssue tracker
IDE
Developers
Code
Source control
Build tools
Testing tools
Security &auditors
Choosing the right static code analyzers based on hard data
When & where will the analyzer run?
22
ManagersIssue tracker
IDE
Developers
Code
Source control
Build tools
Testing tools
Security &auditors
Developer workstation§ Live analysis while
coding in IDE?§ Pre-commit invocation?
Build server§ CI integration?§ Command line
interface (CLI)?
Standalone server§ Scheduled scans?§ Source control
integration?
Choosing the right static code analyzers based on hard data
What inputs does the analyzer require?
23
ManagersIssue tracker
IDE
Developers
Code
Source control
Build tools
Testing tools
Security &auditors
Incremental changes to code (commit, patch, pull request)
Source code during build process
Full project source code
Compiled binaries
Choosing the right static code analyzers based on hard data
How do people view findings?
24
ManagersIssue tracker
IDE
Developers
Code
Source control
Build tools
Testing tools
Security &auditors
Directly in IDE
Web-based graphical user interface (GUI)
API integration§ SARIF/XML/JSON/CSV§ Issue tracker§ Requirements
management system§ Vulnerability
management system
Choosing the right static code analyzers based on hard data
4. Speed & scalability
The ability of an analyzer to handle anticipated workloads
25
Choosing the right static code analyzers based on hard data
4. Speed & scalability
How large a codebase can the analyzer scan?
How many projects can be scanned?
How long does the analyzer take to work?§ Scan duration for <software project>, using default checkers
– BodgeIt Store v1.4.0, Broadleaf Commerce v3.0.3, dotCMS v5.0.3, hadoop v.1.1.2, Jenkins v1.534, OWASP Benchmark v1.2beta, etc.
Does the analyzer have features to handle more work?§ Parallelize on one host
§ Parallelize across multiple hosts
26
Choosing the right static code analyzers based on hard data
6. Reporting
Capabilities that support presenting information to create understanding and support decision making
27
Choosing the right static code analyzers based on hard data
Graphical user interface (GUI) § Ability to search results § Results remediation workflow § Hierarchical reporting for
multiple projects, teams, departments, etc.
§ Filter results by compliance standard
Centralized reporting§ Role-based access
Results suppression even after code changes
Show differences in results set to previous scan
6. Reporting
28
Choosing the right static code analyzers based on hard data
7. Support
Indicators of how much assistance and guidance is available
29
Choosing the right static code analyzers based on hard data
7. Support
Guides & documentation§ Installation
§ User/operator
§ Integration
Open source project health§ A major undertaking itself
– Linux Foundation CHAOSS (Community Health Analytics OSS)
– Black Duck Open Hub
30
Choosing the right static code analyzers based on hard data
3. Coverage
The extent to which an analyzer can examine my software
31
Choosing the right static code analyzers based on hard data
Coverage is mostly about two questions
Will the analyzer work on my software?§ Programming language & framework
§ Binary format
Can the analyzer find the issues I care about?§ Which issues are claimed can be detected?
32
Choosing the right static code analyzers based on hard data
Static analyzers have limited weakness coverage
Kris Britton and Chuck Willis, “Sticking to the Facts: Scientific Study of Static Analysis Tools”, Sept 2011: http://vimeo.com/32421617
On average,one static analyzer
will only detect
of all weakness types
14%
33
Choosing the right static code analyzers based on hard data
5. Results quality
How well an analyzer can detect issues and the utility of the reported warnings (how useful they are)
34
Choosing the right static code analyzers based on hard data
Results quality is also mostly about two questions
Can people understand, trust, and use the generated warnings?§ Usability of discovered issues / warnings
– Explanation of warning & suggested severity
– Confidence information about warning
– Context around warning
» Source code
» Control flow
» Data flow
How well does the analyzer detect issues?§ Technical performance measures derived using test suites
– Precision, recall, discrimination rate
35
Choosing the right static code analyzers based on hard data
Kompar platformProgress & plans
36
Choosing the right static code analyzers based on hard data
Kompar is more than just a website
Kompar is a set of technologies that work to collect, generate, store, and present information about software analyzers
§ Test suites used to determine
– Execution speed
– Results quality(precision, recall, etc.)
§ Automation to orchestrate execution of analyzers against test suites
§ Automation to label results with true/false positive status
§ Collection of tool properties
§ System to crowdsource collection of analyzer information
§ Content to educate consumers
§ Functionality to help people learn which analyzers meet their needs
– Metrics and scores logic
– Side-by-side comparisons
– More
37
Choosing the right static code analyzers based on hard data
Kompar system overview
Analyzer info
Analyzer info crowdsourcing
Analyzer benchmarking
Test suites
Analyzer presentation
Find
View
Compare
Ratings & reviews
Requests to catalog
User analytics & reporting
38
Choosing the right static code analyzers based on hard data
Kompar website as of Jan 2020
Publicly available§ https://kompar.tools
§ Catalogs up to 70 analyzer properties
Seven categories§ Basic information
§ Process integration
§ Coverage
§ Speed & scalability
§ Results quality
§ Reporting
§ Support
Landing page
Analyzer list
Analyzer details
73 tools
39
Choosing the right static code analyzers based on hard data
Challenges ahead
Still much work to do:§ Information collection
– Refine questionnaire
– Collect more details about each analyzer (esp. weakness coverage)
§ Results quality benchmarking
– Select & augment test suites
– Run analyzers against test suites
– Automation to score analyzer warnings
» “Give credit” for detecting real issues
» “Deduct credit” for detecting non-issues
§ Expand documentation & comparison functionality on Kompar website
§ Make Kompar self-sustaining
40
Choosing the right static code analyzers based on hard data
DRAFT Analyzer info. collection, in-progress submissions
41
Choosing the right static code analyzers based on hard data
DRAFT Analyzer info. collection, analyzer question form
42
Choosing the right static code analyzers based on hard data
DRAFT Analyzer info. collection, weakness coverage
43
Choosing the right static code analyzers based on hard data
DRAFT Analyzer info. collection, submit for review
44
Choosing the right static code analyzers based on hard data
Candidate benchmark test suites
45
Programming language
Candidate test suites to generate results quality info.
Source
Java Juliet 1.3BenchmarkApplication CVEs
NIST SARDOWASPNIST SARD, Vulnerability History Project
C/C++ Juliet 1.3Multiple othersApplication CVEs
NIST SARDNIST SARD NIST SARD, Vulnerability History Project
Python Application CVEs Needs curation
C# C# Vulnerability Test Suite NIST SARD
PHP PHP Vulnerability Test Suite NIST SARD
Visual Basic .NET Unknown
JavaScript Juice Shop OWASPNIST SARD test suites https://samate.nist.gov/SARD/testsuite.php
Vulnerability history project (curated CVEs) http://vulnerabilityhistory.org/OWASP Benchmark https://owasp.org/www-project-benchmark/
↓
Choosing the right static code analyzers based on hard data
Make a contribution
Komparhttps://kompar.tools
Request addition of an analyzerhttps://www.surveygizmo.com/s3/4890321/Kompar-Tool-Request
Submit details about your analyzerhttps://www.surveygizmo.com/s3/4885264/Kompar-Detailed-Tool-Request
Backlog for managing analyzer cataloging activityhttps://trello.com/b/gzrRyvAE
Strike up a [email protected]
46
Top Related