Post on 16-Apr-2017
December 8, 2016Andrew Clark, IT Auditor / Internal Audit Data Scientist
Astec Industries, Inc., M.S. Data Science Candidate
Where Open Source Meets Audit Analytics
Overview
1. What is open source software?
2. Why is it important?
3. What are the benefits of using open source software for analytics over CAATs?
4. How do I begin using open source software for analytics?
5. Case study
6. The application of advanced analytic techniques
Meet Open Source
Open Source Software
“Open source software is software whose source code is available for modification or enhancement by anyone.”What Is Open Source?" Opensource.com. Accessed June 12, 2016. https://opensource.com/resources/what-open-source.
Open Source examples
1. Linux (mainly)
2. Android (mainly)
3. Firefox
4. R programming language
5. Git
6. Docker
Why is it important?
Vibrant community
Frequent updates
Potential for strong security
Cutting edge technology
Customizable
Cost
How does Open Source relate to Audit Analytics?
State of the art technology
Computer science's best and brightest love to contribute
Customizable
Scalability
Beautiful visualizations
Analytics and Data Science leaders use almost exclusively open source frameworks for their analytics, i.e. Google, Facebook, Uber, Airbnb, etc.
"Bubble Charts." Plotly. Accessed August 14, 2016. https://plot.ly/python/bubble-charts/.
Benefits over traditional CAATs
ACL, IDEA, Arbutus, the existing market leaders
Not very user friendly
Requires extensive training to use effectively
Not very flexible
Does not provide the output auditors are expecting
So what do we do about it?
Enter Python (and R)
Open source, general purpose programming language
High level of support
Used by some of the best and brightest in Data Science
Extensive scientific, mathematic, data wrangling and visualization libraries
Most popular first language in computer science departments across America (http://tinyurl.com/knw5mdv)
What is Python?
"About Python." Python.org. Accessed August 14, 2016. https://www.python.org/about/.
What is R?
"R is a language and environment for statistical computing and graphics."- "What Is R?" The R Project for Statistical Computing. Accessed August 14, 2016. https://www.r-
project.org/about.html.
Used widely by statisticians for statistical analysis
As a result of its widespread use, thousands of easy to implement libraries that provide *all* widely used statistical techniques
Is not a 'real' programming language
How would we go about using Python (or R)?
The hard way: by learning it
The even harder way: hire an auditor with programming, analytics and auditing experience
The *easiest* and most effective way: create a cross functional team by borrowing a programmer from IT and a business analyst from the business.
Example Python (and R) analytic test
https://github.com/aclarkData/AuditAnalytics
999 amount, weekends and keywords journal entry tests
Steps:
Input libraries
Import data
Wrangle as needed
Export to folder
Schedule - Task Scheduler in Windows, Cron, or equivalent in Unix based system, i.e. Mac and Linux
Machine Learning
In essence, a machine understanding patterns in data without having to be explicitly programmed.
Very, very powerful technology that is transforming banking, search engines, advertising, and soon, every industry.
Examples: Credit card fraud detection, target demographic advertising, anomalous sensory data, etc.
Machine Learning Cont.
Numerous possibilities for utilizing machine learning and related technology, e.x. Natural Language Processing, etc., for Financial Auditing
For example, unsupervised clustering algorithm in use at Astec Industries.
Latest developments are only available in open source software or expensive statistical or computational programs such as SAS, which currently runs at a minimum of $9,200 upfront per single user license plus annual fees - “SAS® Analytics Pro."
SAS®. Accessed August 26, 2016. https://www.sas.com/store/software/analytics-pro/prodPERSANL.html.
Possibilities:
Time Series Machine Learning for predicting account balances
Natural Language Processing techniques for contract review and summarization - current bottleneck is (OCR) Optical Character Recognition technology.
Sentiment Analysis for Journal Entry and Transaction descriptions.
Jupyter notebooks for reproducible analytics and audit documentation
https://try.jupyter.org/
Conclusion:
Definition of Open Source Software
Unlimited possibilities for a customizable analytics experience
Scalable
Real world example
Machine Learning and the future of audit analytics
THANK YOU!
Please Remember To Fill Out YourSession Evaluation Forms!