Asking Why

32
Asking “Why?” A lesson for Data Scientists and those who manage them Adapted from a post by Mike Stringer & Dean Malmgren, founders of Datascope ? ? ?

Transcript of Asking Why

Asking “Why?”

A lesson for Data Scientists and those who manage them

Adapted from a post by Mike Stringer & Dean Malmgren, founders of Datascope???

The other day we had a conversation with a bespectacled senior data scientist at another organization (named X to protect the innocent).

Many of us have had similar conversations with people like X, and many of us have been X before.

Data scientists, being curious individuals, are often drawn to projects because: ☑ they’re interesting ☑ they’re fun ☑ they’re technically challenging ☑ their boss heard about “big data” in

the Wall Street Journal

These reasons are all distinctly different from trying to solve an important problem.

Important problems in business are often daunting to data scientists because they don’t strictly require data to solve…

…and there are established experts already working on them.

Operations Product Development

StrategyHuman Resources Marketing

IT R&DSales

Yet these roles increasingly have an opportunity to use data in innovative ways, to make dents in long-standing problems where quantitative approaches have previously been impossible.

Operations Product Development

StrategyHuman Resources Marketing

IT R&DSales

To tap this abundant resource of useful problems to solve, data scientists must:

1. learn from business domain experts about real problems

2. think creatively about if and how data can be used as part of a solution

3. focus on problems that actually improve the business.

Going in any different order is a recipe for disillusionment about big data’s true potential.

Starting with a real problem instead of starting with some interesting dataset often leads data scientists down a completely different—and much more fruitful—path.

A real example from our work at Datascope:

In 2010, Brian Uzzi introduced us to Daegis, an e-discovery services provider

When a company gets sued, they have to provide all documents relevant to the case.

E-discovery companies like Daegis use a combination of technology and lawyers to help sued companies provide these documents, without providing anything they don’t need to.

Early conversations circled around “social network analysis”.

Daegis’ client datasets contained millions of emails we could parse, study and visualize!

☑ Interesting

☑ Fun

☑ Technically challenging

☐ Useful to the business

But we caught ourselves, and asked one important question.

Why?

Instead of social networks, we made the first phase of our project building a quick prototype using data from the Text Retrieval Conference (TREC).

We demonstrated that our transductive learning algorithms could reduce the number of documents that needed to be reviewed by 80-99%.

This was huge!

We were going to help Daegis gain a tremendous advantage and Daegis’ clients would be able to defend themselves from frivolous lawsuits.

+1 for the good guys. Right?

There’s that “why” again.

Had we asked about this at the beginning of the project we would’ve known the importance of defensibility.

After more design iterations (see our Strata presentation or slides if you’re interested), we arrived at some insights: what we developed needed to be educational, transparent, and understandable.

By the end, if you had to summarize the project, it would be closer to “educating attorneys about information retrieval” than “social network analysis.”

The final result is a product that Daegis sells under the name Acumen.

This case illustrates a lesson for data scientists:

Ask why first!

But beware.

The answers to this deceptively simple question may surprise you, take you into challenging uncharted territory, and inspire you to think about problems in completely different ways.

Learn more about us at http://datasco.pe

Thanks for your attention.