Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

28
Big Data Analytics, R&D Robert Andrew Stevens, CFA John Deere

description

This session will cover issues and and advice for implementing Big Data Analytics in a Research and Development context. In addition to the basics, it will discuss the past, present and future and touch on relevant mathematics, statistics, science, technology, economics, business, history and even some literature. For more information on the Loras College 2014 Business Analytics Symposium, the Loras College MBA in Business Analytics or the Loras College Business Analytics Certificate visit www.loras.edu/mba or www.loras.edu/bigdata.

Transcript of Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

Page 1: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

Big Data Analytics, R&DRobert Andrew Stevens, CFA

John Deere

Page 2: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

Disclaimer

The information, views, and opinions contained in this presentation are those of the author and do not necessarily reflect the views and opinions of John Deere

Page 3: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

Outline = Favorite Quotes

1. “when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind”

2. “it takes all the running you can do, to keep in the same place”

3. “The future is already here – it’s just not evenly distributed”4. “The essence of strategy is the timing of the sunk cost

commitment”5. “Americans can always be counted on to do the right

thing...”

Page 4: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

“when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind”

“I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science, whatever the matter may be.”

Lecture on “Electrical Units of Measurement” (3 May 1883), published in Popular Lectures Vol. I, p. 73; quoted in Encyclopaedia of Occupational Health and Safety (1998) by Jeanne Mager Stellman, p. 1992http://en.wikiquote.org/wiki/William_Thomson

http://en.wikipedia.org/wiki/Lord_Kelvin

William Thomson, 1st Baron Kelvin

1824–1907

a.k.a.: Lord KelvinOccupation: mathematical physicist and engineer

Page 5: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

What is Analytics?Turning Data into Decisions

Production, Assembly, Inspection

Distribution

Consumers

ConsumerResearch

Designand

Redesign

Receipt andTest of

Materials

Tests of Process,Machines, Methods,

Costs

Suppliers ofMaterials and

Equipment

* Deming, W.E. Out of the Crisis,1986 (p. 4)

Production Viewed as a System *

Take Action!

Page 6: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

The Road to Earlier Discovery and Shorter Decision Cycles

Page 7: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

Big Data in R&D at John Deere

Primarily machine data: CAN and GPSVolume: immeasurableVelocity: fast and furiousVariety: nothing is the sameValue: TBD

Page 8: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

“it takes all the running you can do, to keep in the same place”

The Red Queen's race is an incident that appears in Lewis Carroll's Through the Looking-Glass and involves the Red Queen, a representation of a Queen in chess, and Alice constantly running but remaining in the same spot.

“Well, in our country,” said Alice, still panting a little, “you'd generally get to somewhere else — if you run very fast for a long time, as we've been doing.”“A slow sort of country!” said the Queen. “Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!”http://en.wikipedia.org/wiki/Red_Queen's_race

http://en.wikipedia.org/wiki/Lewis_Carroll

Charles Lutwidge Dodgson

1832–1898

Pen name: Lewis CarrollOccupation: Writer, mathematician,  Anglican cleric, photographer, artist

Page 9: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

The Problem/Opportunity

Data generated

Data analyzed

Data captured and stored

[Remember: DIKW = Data Information Knowledge Wisdom ?]

Page 10: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

Ideally, if nothing changes…Today Transition Vision

Page 11: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

But the data generated might grow faster than we can manage

[Ever hear of “The Internet of Things” ?]

Today Transition Vision

Page 12: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

So, maybe we should try to do something like this…

[“If you want to get somewhere else, you must run at least twice as fast as that!”]

Today Transition Vision

Page 13: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

A Solution: Data Science

• Applies everywhere

• Practical/feasible?

• In R&D?http://www.dataists.com/2010/09/the-data-science-venn-diagram

Page 14: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

Data Science in R&D

1. Multidisciplinary Investigations (25%) 2. Models and Methods for Data (20%) 3. Computing with Data (15%) 4. Pedagogy (15%) 5. Tool Evaluation (5%) 6. Theory (20%)Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics , ISI Review, , 69, 21-26. W. S. Cleveland, 2001.http://www.stat.purdue.edu/~wsc/papers/datascience.pdf

Page 15: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

“The future is already here – it’s just not evenly distributed”— William Gibson, quoted in The Economist, December 4, 2003

http://www.economist.com/printedition/2003-12-06http://en.wikipedia.org/wiki/William_Gibson

William Gibson1948–

Page 16: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

CERN: Solving the Mysteries of the Universe with Big Data

The Large Hadron Collider Computing Challenge• Data volume

– High rate large number of channels 4 experiments – 15 PetaBytes of new data each year 30 PB in 2013

• Overall compute power – Event complexity Nb. events thousands users – 200 k cores 350 k cores– 45 PB of disk storage 150 PB Storage

http://openlab.web.cern.ch/sites/openlab.web.cern.ch/files/presentations/Jarp_Big_Data_Boston_final.pdf (09/12/13)

Page 17: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

The Scientific Method

1. Formulation of a question

2. Hypothesis3. Prediction4. Testing5. Analysis

http://en.wikipedia.org/wiki/Scientific_method

An 18th-century depiction of early experimentation in the field of chemistry

Page 18: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

“The essence of strategy is the timing of the sunk cost commitment”Verbal communication during UIUC MBA Strategic Management class

http://www.amazon.com/Economic-Foundations-Strategy-Organizational-Science/dp/1412905435http://business.illinois.edu/facultyprofile/faculty_profile.aspx?ID=99

Professor of Business Administration and Caterpillar Chair of BusinessUniversity of Illinois at Urbana-Champaign

Joseph T. Mahoney1958–

Page 19: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

What happens to Q as P 0?• Change “Household” to

“Firm”• Change “chocolate” to

“software”• Now what happens to Q as

P 0?• How could that happen in

a Big Data Analytics, R&D context?http://catalog.flatworldknowledge.com/bookhub/reader/2992?e=coopermicro-ch07_s01

Figure 7.1 The Demand Curve of an Individual Household

Page 20: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

The One-Day MBA

http://www.engineeringtoolbox.com/cash-flow-diagrams-d_1231.htmlhttp://en.wikipedia.org/wiki/Net_present_value

𝑁𝑃𝑉=∑𝑡=0

𝑛 𝐹 𝑡

(1+𝑖)𝑡

F0 = Sunk cost investment

• Assuming Ft does not decrease* for t > 0, what happens to NPV as F0 0?

• How could that happen in a Big Data Analytics, R&D context?

• What are the implications for strategy?

Page 21: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

Avoid Sunk Cost Commitments and Vendor Lock-in with Open Source

• Apache: http://www.apache.org/– Hadoop, Hive, Mahout, Pig, Spark…

• GRASS GIS: http://grass.osgeo.org/• Java: http://www.java.com/ + Cassandra• Julia: http://julialang.org/• Perl: http://www.perl.org/• Python: http://www.python.org/• R: http://cran.us.r-project.org/ + RHIPE• Scala: http://scala-lang.org/ + Scalding• SQL:

– http://www.mysql.com/– http://www.postgresql.org/ + PostGIS

Page 22: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

“Americans can always be counted on to do the right thing...”

“...after they have exhausted all other possibilities.”

Also famous for: “We shall never surrender” “peace in our time”And many others relevant to The War on Data

http://www.quotedb.com/quotes/2313https://en.wikipedia.org/wiki/Winston_churchill

Sir Winston Churchill1874–1965

Profession: Member of Parliament , statesman, soldier, journalist, historian, author, painter

Page 23: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

Tips for winning The War on Data

Teamwork

Statistics

Partner with IT

Learn-Do-Teach

Replenish your toolbox

Math

Page 24: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

Pop Quiz

What are the 3 most important things in Real Estate?1. Location2. Location3. Location

What are the 3 most important things in Statistics?4. Look at the data5. Look at the data6. Look at the data

… especially for Big Data Analytics:7. Look at the data before you analyze it: Exploratory Data Analysis (EDA)8. Look at the data while you analyze it: model diagnostics9. Look at the data after you analyze it: visualization and communication

Page 25: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

Other Survival Tips

• Visualization and Communication– Tools: R & Rmd, Ggobi, Tableau, ArcGIS/GRASS…– Presentations: Tell them 3X, 5Ws

• Collaboration: working as a team– File and code version control– Google's R Style Guide

• Reproducible Research best practices– Avoid errors by Potti (Duke) and Rogoff & Reinhart (Harvard)

• http://en.wikipedia.org/wiki/Anil_Potti• http://en.wikipedia.org/wiki/Reinhart-Rogoff

Page 26: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

Summary = Favorite Quotes

1. “when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind”

2. “it takes all the running you can do, to keep in the same place”

3. “The future is already here – it's just not evenly distributed”4. “The essence of strategy is the timing of the sunk cost

commitment”5. “Americans can always be counted on to do the right

thing...”“Those who cannot remember the past are condemned to repeat it.”– George Santayana

Page 27: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

Q & A

Page 28: Loras College 2014 Business Analytics Symposium | Andy Stevens: Big Data Analytics

Contact Information

E-mail:[email protected] (business)

[email protected] (personal)

LinkedIn: http://www.linkedin.com/pub/robert-andrew-stevens-cfa/6a/a04/315

Twitter: https://twitter.com/RobertAndrewSt3

GitHub: https://github.com/robertandrewstevens