ITAC 2016 Where Open Source Meets Audit Analytics

Post on 16-Apr-2017

252 views 1 download

Transcript of ITAC 2016 Where Open Source Meets Audit Analytics

December 8, 2016Andrew Clark, IT Auditor / Internal Audit Data Scientist

Astec Industries, Inc., M.S. Data Science Candidate

Where Open Source Meets Audit Analytics

Overview

1. What is open source software?

2. Why is it important?

3. What are the benefits of using open source software for analytics over CAATs?

4. How do I begin using open source software for analytics?

5. Case study

6. The application of advanced analytic techniques

Meet Open Source

Open Source Software

“Open source software is software whose source code is available for modification or enhancement by anyone.”What Is Open Source?" Opensource.com. Accessed June 12, 2016. https://opensource.com/resources/what-open-source.

Open Source examples

1. Linux (mainly)

2. Android (mainly)

3. Firefox

4. R programming language

5. Git

6. Docker

Why is it important?

Vibrant community

Frequent updates

Potential for strong security

Cutting edge technology

Customizable

Cost

How does Open Source relate to Audit Analytics?

State of the art technology

Computer science's best and brightest love to contribute

Customizable

Scalability

Beautiful visualizations

Analytics and Data Science leaders use almost exclusively open source frameworks for their analytics, i.e. Google, Facebook, Uber, Airbnb, etc.

"Bubble Charts." Plotly. Accessed August 14, 2016. https://plot.ly/python/bubble-charts/.

Benefits over traditional CAATs

ACL, IDEA, Arbutus, the existing market leaders

Not very user friendly

Requires extensive training to use effectively

Not very flexible

Does not provide the output auditors are expecting

So what do we do about it?

Enter Python (and R)

Open source, general purpose programming language

High level of support

Used by some of the best and brightest in Data Science

Extensive scientific, mathematic, data wrangling and visualization libraries

Most popular first language in computer science departments across America (http://tinyurl.com/knw5mdv)

What is Python?

"About Python." Python.org. Accessed August 14, 2016. https://www.python.org/about/.

What is R?

"R is a language and environment for statistical computing and graphics."- "What Is R?" The R Project for Statistical Computing. Accessed August 14, 2016. https://www.r-

project.org/about.html.

Used widely by statisticians for statistical analysis

As a result of its widespread use, thousands of easy to implement libraries that provide *all* widely used statistical techniques

Is not a 'real' programming language

How would we go about using Python (or R)?

The hard way: by learning it

The even harder way: hire an auditor with programming, analytics and auditing experience

The *easiest* and most effective way: create a cross functional team by borrowing a programmer from IT and a business analyst from the business.

Example Python (and R) analytic test

https://github.com/aclarkData/AuditAnalytics

999 amount, weekends and keywords journal entry tests

Steps:

Input libraries

Import data

Wrangle as needed

Export to folder

Email

Schedule - Task Scheduler in Windows, Cron, or equivalent in Unix based system, i.e. Mac and Linux

Machine Learning

In essence, a machine understanding patterns in data without having to be explicitly programmed.

Very, very powerful technology that is transforming banking, search engines, advertising, and soon, every industry.

Examples: Credit card fraud detection, target demographic advertising, anomalous sensory data, etc.

Machine Learning Cont.

Numerous possibilities for utilizing machine learning and related technology, e.x. Natural Language Processing, etc., for Financial Auditing

For example, unsupervised clustering algorithm in use at Astec Industries.

Latest developments are only available in open source software or expensive statistical or computational programs such as SAS, which currently runs at a minimum of $9,200 upfront per single user license plus annual fees - “SAS® Analytics Pro."

SAS®. Accessed August 26, 2016. https://www.sas.com/store/software/analytics-pro/prodPERSANL.html.

Possibilities:

Time Series Machine Learning for predicting account balances

Natural Language Processing techniques for contract review and summarization - current bottleneck is (OCR) Optical Character Recognition technology.

Sentiment Analysis for Journal Entry and Transaction descriptions.

Jupyter notebooks for reproducible analytics and audit documentation

https://try.jupyter.org/

Conclusion:

Definition of Open Source Software

Unlimited possibilities for a customizable analytics experience

Scalable

Real world example

Machine Learning and the future of audit analytics

THANK YOU!

Please Remember To Fill Out YourSession Evaluation Forms!