SDS PODCAST EPISODE 65 WITH MICHAEL SEGALA...Kirill: Alright so ODSC, this is the Open Data Science...

SDS PODCAST

EPISODE 65

WITH

MICHAEL SEGALA

http://www.superdatascience.com/65

Kirill: This is episode number 65 with CEO and Co-founder at SFL

Scientific, Michael Segala.

(background music plays)

Welcome to the SuperDataScience podcast. My name is Kirill

Eremenko, data science coach and lifestyle entrepreneur.

And each week we bring you inspiring people and ideas to

help you build your successful career in data science.

Thanks for being here today and now let’s make the complex

simple.


Hello and welcome everybody to the SuperDataScience

podcast. Very excited about this episode because today we've

got the CEO and Co-Founder of SFL Scientific. SFL Scientific

is a top data science consulting firm and they have worked

with huge clients such as Staples, Goodyear, and Salesforce.

So what did we talk about? Well, Mike gave a quick overview

of his background. He actually worked on the Large Hadron

Collider in CERN on the Higgs Boson problem. So you might

have heard this, this was a big thing, and still is, the Higgs

Boson, the particle that's responsible for gravity. So he was

on that huge project, which is really cool.

And then we also talked about some case studies. So you'll

find this very exciting. Mike shared some case studies from

the consulting world, and how data science can be used to

drive actual business value to businesses. So I think that

was really valuable. Also Mike talked about what it is to use

data science in your organisation. So if you're a CEO,

Executive, a business owner, an entrepreneur, you'll find a

lot of valuable insights here about data strategy and how to


think about using data in your organisation. And of course

Mike gave some tips for those of you who are building data

science careers. So you'll find some very valuable golden

nuggets here which you can apply to your career.

All in all, as you can see, everything was covered in this

podcast. Very excited to bring this to you. And without

further ado, please welcome Mike Segala, the CEO and Co-

founder at SFL Scientific.


Welcome everybody to the SuperDataScience podcast. Today

I've got a very special guest on the show, Michael Segala,

who is a CEO and a Co-founder of SFL Scientific, a data

science consultancy firm. Michael, welcome to the show.

How are you going today?

Mike: Hi Kirill. Thanks, I'm good. I'm good. I'm doing great. Thank

you very much for having me on today.

Kirill: It's great to have you on. And unfortunately we didn't get a

chance to meet in person at ODSC when I was there, but at

the same time, I met with a couple of your colleagues who

were there at that time, so Daniel and Alexander. And you

guys are doing some great work, so I'm very excited to have

you on the show actually.

Mike: Great, yeah, thank you very much.

Kirill: And just to start off with, I promised you I would ask this

question. Where are you calling from?

Mike: I'm in Boston.

Kirill: And the weather is?


Mike: Yeah, it's beautiful today for the first time in about 6

months, so it is very nice to look outside and see some

warm, sunny skies.

Kirill: Fantastic, fantastic. Like I told Michael, I've never been to

London yet. But Boston feels a lot like how people describe

London, and I feel that every time it's sunny in Boston, there

should be a public holiday or something!

Mike: Not to knock on London, but I don't think we're quite as bad!

We get some nice months. London gets a little dreary. Not to

put shame on any London people listening, but Boston I

think is a bit nicer.

Kirill: Yeah, ok, that's a good start I guess. Better than London is

not a huge accomplishment. But ok, we'll give you that!

Mike: Sure.

Kirill: Alright so ODSC, this is the Open Data Science Conference.

Tell us what you guys are doing there. You guys had a whole

booth and people were coming up, talking, you were giving

stuff out, your booth was one of the most interesting ones to

chat to personally, because I learned quite a lot. You have

some very high calibre people there. Tell us what you guys

were doing at ODSC, what were you trying to achieve there.

Mike: Sure. So let me just give you roughly a few seconds about

the background, and then tell you why we're at ODSC. So at

SFL, we're a very boutique data science consulting firm. And

what we do is we work with our clients at that very

individualised professional service level to build out some

kind of unique solutions for them. So a lot of this centres

around machine learning, artificial intelligence, business


intelligence, something to get in there, understand the core

of their problem, and then start tackling it and providing

solutions for them. So at ODSC, a lot of people are there,

and they're kind of looking at what they can for their

business to drive value. That's what it's about. It's not a

theoretical exercise any more, it's a business exercise. So we

have a really nice presence at ODSC because a lot of people

are looking for education and ways to improve their business

function.

So we're basically there speaking to people, educating them,

teaching them what they could do with data science and

then of course trying to get them as customers and have

that nice rapport with them.

Kirill: Totally understand and I like that you mentioned the drive

value part of things because I feel that a few talks in the

conference were focused on that. There was this part of the

conference actually, they just launched called the CXO

Summit, which happens at the very start. And that is

specifically focused on using data science to drive value. And

I think that a lot of people don't get that part right, or even

get it wrong in the sense that they forget about – there's data

science and then there's business value. And you need to

really connect the two to make the case for data science.

And it sounds like that is exactly what you guys are doing.

Tell us a bit more. Are you guys based in Boston? Do you

only work with Boston-based clients, or do you have a

national presence in the US?

Mike: Sure. Yeah so we're a bit scattered out. So we have 3 co-

founders and we're all separated. So I'm in Boston, the


headquarters are in Boston. My other co-founder is down in

New York. And then we have our other co-founder on the

West Coast. And then we have employees scattered about.

So we're all over the US and we have clients completely

internationally. So I have clients down by you in Australia,

which is interesting because you've got to talk to them in the

middle of the night, and you see some really interesting

communications. We have people in Japan, we've had clients

from London, Germany, really all over the world. So we have

a really nice presence both locally to the Boston and New

York area and then kind of spread out all over the world.

Kirill: Ok. Gotcha. That's very interesting. It sounds like you guys

have some very high calibre people on there. Like you have a

few PhDs, you have some people with backgrounds in

engineering, statistics, and so on. How do you go about

building up this team that's so scattered and at the same

time working together to make SFL happen?

Mike: Sure. So I think it's a fun story. So the way that the 3

founders started was we were all in grad school together

down at Brown University, which is an Ivy League school

here in the US. So we were all studying particle physics at

the time. Myself and my other co-founder, we were

experimental physicists working at CERN, at the Large

Hadron Collider. Do you know CERN?

[Cross-talk]

Kirill: That's so cool. That's so cool.

Mike: Yeah so I and my other founder, we were working at the LHC

and my thesis actually was on the discovery of the Higgs

Boson, which was the –


Kirill: Oh wow! Were you part of the team?

Mike: I was. My thesis was literally on the discovery. It was just

incredible timing. It took 30 years of people putting in some

real blood, sweat and tears to build it, and then the day I

started my PhD, the thing turned on. And the day I

graduated, it turned off. I'm not exaggerating.

Kirill: That is so cool!

Mike: I mean, the timing couldn’t have been better. And then when

I got out of school, you’re hot off this big Higgs Boson Nobel

Prize discovery and I just have it on my CV. It’s so cheating

because you present this to somebody at a job interview and

they’re so impressed. Little do they know that it’s a team of

thousands and thousands of people working. But anyways,

we were all doing that together and our other founder was a

theoretical physicist at the time. And back then we were

basically data science and training. What we used to

discover the Higgs or black holes or whatever else is

fundamentally the same exact type of machine learning or

data science that’s applied in industry today.

So we were basically just training to become data scientists

before the real terminology came about. So we left academics

because sadly there’s really not too much money or

opportunities in academics anymore just because of funding

issues, and we started these professional careers as data

scientists. And to touch on one of the points you made about

business value, I think we’re still in this phase where people

are thinking about data science as a theoretical exercise

where we’re sitting down and we’re just trying to write really

cool and sophisticated algorithms to solve things, which is


great, but we really need to think about how do we take it to

the next level and what do we do with the outputs of these

models or whatever we’re building.

So we saw on the market that there’s this huge gap there

that businesses are now getting into this whole data

collection, data sourcing thing and let’s try to capitalize on

this and say, “Okay, we understand the fundamentals of

data science, but we also understand the business value

here.” So we came together, we started the company, and

then all of our technical people, they all happened to as well

be PhDs, mainly in physics and then engineering as well. So,

it’s not a culture where we demand you to be a physicist,

but from a background perspective we have a lot of that kind

of critical thinking and problem-solving skills which are

great, you know, applied to a business sense. That’s kind of

how we all got together, formed a company and then pretty

much got to where we are today.

Kirill: Gotcha. That’s very interesting. I love it. One thing I found

about people with PhDs – not everybody, of course –

sometimes there’s issues with communication, with

communicating insights, and therefore they sometimes face

challenges when working in consulting. Do you find the

same thing or do you somehow source PhDs that don’t have

that problem?

Mike: I talk to a lot of budding data scientists that come out of

school, and the problem is exactly how you’re describing it.

It’s a communication gap where they get so focused on the

academic exercise of it that they don’t really think about

what does the client care about, or what does the business


care about. So it’s great when we find somebody with the

technical skills and then the communication skills, they’re

homeruns, they’re instant hires. But a lot of the time what

we do is, we’ll work with people who have the raw technical

ability and guide them and start bringing them into calls

and meetings and seeing how to communicate properly,

such that they build that rapport themselves. But you’re

absolutely right, there is this kind of thought in the industry

that academics can’t really speak to the business, which

unfortunately is true, but we do have to bridge that gap

when we are consulting.

Kirill: Gotcha. That’s really cool. Well, it’s very exciting that you

work at a data science consulting firm, and moreover, are

the owner of a data science consulting firm. To me it just

means that you have 100% visibility of what you can expose

in the podcast and what you can’t, and I’m going to torture

you a little bit about some case studies of data science in

action. I think our listeners love that the most, when you

can actually give live examples. Could you start with a

simple case study of how you applied data science to help a

business and bring some value to that business?

Mike: Sure. So, one of the things that I’ll mention first before I

start talking about case studies, because I’m sure they’ll go

all over the place – this will make sense as I start going

through this – so what we noticed as a company was, no

matter your industry, so if you’re in pharma, healthcare,

financial services, insurance, it doesn’t matter.

Fundamentally you’re all solving the same problems.


When we look at it from a data perspective, most people are

solving things in text analytics or vision, they have an image

or there is some kind of time series pattern we’re trying to

expose, or there’s a marketing type of question. So if you

kind of think it from that perspective, all businesses are the

same and they’re solving the same problems, which is really

cool for us because then we get to work with literally all

verticals.

So, kind of coming back to your question about use cases,

and you’ll see as we talk through these I’ll mention all sorts

of random verticals, but when you think of data science it

doesn’t really matter, your vertical, as long as you can kind

of get down some of the terminology that these verticals

speak in.

So first case study: this is a bit of a boring one, I think, but

it’s actually a huge problem in so many fields. For instance,

pharma companies or finance companies, they have an

absurd amount of unstructured data. This is all text data, so

this could be — like, if you are looking at case studies or

financial reports and you’re looking to make investments or

whatever you’re trying to do from drug discovery or whatever

the use case is, you have all this information sitting there

that’s constantly coming in to you. And what you would do

as an analyst is you would literally have to comb through all

this information and start to try to extract key elements from

thousands or millions of .pdf files or Excel files or what have

you, and aggregate it into one kind of centralized

spreadsheet or something.


So one of the things that we worked pretty heavily on is

building these end-to-end natural language processing

pipelines that would automatically clean, aggregate, classify

and then extract these huge amounts of unstructured data.

So that’s one of the really interesting problems to work on

from a NLP perspective. We see a lot of this in the pharma

space, the legal space, even real estate agents are trying to

do things like this. That’s one of the more boring case

studies.

Kirill: Quick question on that. Where does that text come from in

pharmaceutical companies?

Mike: These could be things like field reports. So,

pharmacovigilance – I don’t know if you’ve ever heard of this

terminology – is basically the study of how people react to

taking drugs. It turns out that lots of people actually die

every year because they take some kind of combination of

drugs. In the U.S., we love subscribing and taking

medications that just interact really poorly with each other.

So there’s all this information out there about drug

interactions and, you know, “this did this” and “this did

that,” and you can kind of compile from just all historical

records or academic literature huge amounts of case studies

that you start doing this. If you’re a finance portfolio

company, you literally have hundreds or thousands of

investments and each one of these investments send that

data to you. So there’s always this kind of data transfer

happening out there and it’s pretty organic through most of

the organizations that they’re collecting and seeing these

data sources.


Kirill: Okay, gotcha. So, you take that text data and then you turn

it into quantified data, into numbers and so on, and then

they do something with it or do you actually deliver insights?

Mike: Right. We deliver insights. That’s the whole idea. Our whole

core mission is to not only apply the machine learning

algorithms, but allow them to get some consumption, make

it consumable and actionable. If they can’t do anything with

it, we’ve all just wasted a lot of time and money. For

instance, if you’re taking it from the fintech perspective, the

financial services, you’ve now allowed this portfolio manager

to see his investments across a thousand different

companies and start to draw insights like, “Hey, this

company is really worth investing in,” or, “I need to drop

them because I’m losing money.” There’s tons of actual

business ROI that can happen by just aggregating and

extracting tons of different unstructured text information.

Kirill: Gotcha. Okay, cool. So that’s case study number one. Let’s

move on to the next one.

Mike: All right. Number two – let’s talk about medical stuff. So,

data science is having a huge bust or boom right now in the

medical field. We work a lot with these deep learning

technologies in medicine specifically. For instance, there is

this field of study called digital histopathology, which

basically means you go to a doctor, the doctor takes some

kind of biopsy, they look at it under a microscope, and a

human sits there and says, “Do I think this is cancer or not

cancer?” It’s a very important process, you know, diagnosing

cancer is not something to take lightly, but it’s very

expensive and it’s very human-intensive.


So what we started to do is build out large scale algorithms

using these kind of really interesting state-of-the-art neural

network architectures to automatically kind of take the

cellular activity and start detecting and diagnosing things

such as cancer or other types of diagnoses purposes, like

identifying viable cells for in vitro fertilizations, or whatever

you’re trying to do, you can start applying deep learning to

that and doing it from a very precise data science way. So

that’s a really cool case study that we’re working on right

now.

Kirill: That’s really interesting and I’m really glad you mentioned

deep learning, because I find a lot of the more traditional,

bigger, larger in size consulting firms — I’m not going to

point any fingers here, but big ones with thousands of

employees — they are very slow to adapt these bleeding edge

technologies such as deep learning. Deep learning sounds

like a very sophisticated thing, something that is very hard

to put in an engagement letter for them to bring to a client

because it’s risky, you know, it’s something completely new

and they don’t know how to even wield that sword.

Whereas you guys are like, “Oh, yeah, we’re just going to

throw in some deep learning into this mix.” And why not,

right? It’s like the most powerful technology and it can get

the best accuracy out of most of the machine learning

algorithms. Why not apply that? So I’m really happy that

you mentioned that.

Mike: Yeah, you’re absolutely right. And I mean, we get asked a lot

about competition from these larger traditional management

consulting firms that now have moved to data science. And


the problems that they’re solving are those kind of low-

hanging marketing analytics problems. They really haven’t

started to dive into these more R&D-based problems. Which

is great for us, because there’s a huge market evolving there

that we can really be in the forefront of and capitalize on,

which is fun, these are really challenges and interesting

problems and there’s really not much competition, so it’s a

great space to play in.

Kirill: Yeah. It almost sounds like you guys — what you’re doing is

kind of, like, research — similar to what you’re doing in

CERN, but in business. You’re solving problems that haven’t

been solved before. How cool is that?

Mike: You’re absolutely right. It is very cool. And I’ll give you one

more case study like this. This is something that we’re going

to begin working on very soon and I’ll be a little cryptic

about it, but basically we’re going to be using deep learning

to reconstruct organs, such as your lung, that will be turned

into 3D images and then stem cells will be grown on top of

them to have fully viable organs.

Kirill: Wow! That’s insane.

Mike: Yeah. So, not too much more to say about that, but you can

just imagine the implications of taking a deep learning

exercise and turning that into an actual organ. This is the

state of the science. This is bleeding edge R&D research that

we have the privilege and the opportunities to be working on.

So it’s just incredible.

Kirill: Do you guys take people part-time? I want to come work for

you. (Laughs)


Mike: Sure. You got it.

Kirill: This is the best. This is so cool. I love it. And on a side note,

you guys are really working in health, right? You’re changing

people’s lives. I always appreciate that, when people apply

data science to impact people’s lives, because you can do it

at scale. It’s not like you just change one person’s life, which

is already an immense thing, but you’re changing

thousands, potentially millions, of people’s lives in the future

with all of these things that you’re creating and the

consulting work that you do with these companies. So hats

off to you. I think that’s a very noble thing to be doing.

Mike: Thank you.

Kirill: Okay. So, case study—we’ve done one about NLP pipelines;

case studies 2 and 2b were about deep learning. Do you

have another one for us? Case study number 3?

Mike: I have one more completely different one and this is just

because you’re in Australia.

Kirill: Okay, gotcha.

Mike: I mentioned we had an Australian client, so completely

different type of project. We had somebody who you guys call

a professional punter, who essentially, for all our U.S.

listeners, is a professional horse race better.

Kirill: Oh, that’s so cool. I didn’t even know that term.

Mike: Yeah, he’s a punter. So, I guess Australia is the world’s

largest betting country and they bet like 85% of the entire

world market. And one of the biggest things you guys bet on

is horse racing. So he was a professional horse better and he


came to us and he wanted us to build a really large set of

algorithms around predicting winners of horse races. And

it’s so different than NLP or medical things. Now we’re

talking about sports betting.

Then we get to do projects like that and you go in and that’s

just an exercise at data cleaning and really understanding

what is horse racing, what does it mean, how do you build

these features, because there’s so many nuances, there’s

jockeys and trainers and venue. It just gets crazy. So that

was a really fun project and it actually was very successful.

He was able to use this automated machine learning pipeline

that we built and I think he has something like 10-20% ROI

on his betting, which is ridiculous because it’s an automated

system. We don’t get many of those, but when we do, they’re

fun to tackle and everybody jumps for them because they’re

fun to think about.

Kirill: That’s really cool and an exciting example, completely left

field. Okay, thanks a lot for sharing those. Those are some

exciting case studies, but it sounds like you need a very

diverse set of skills for your organization to be able to deliver

on those projects. How do you find or train up your amazing

staff in order to be able to take on these challenges?

Mike: Yeah, that’s a challenge. The only specialists I look for are

specialists in the deep learning field that could do things like

those cellular type of classification problems. Because that

takes an extreme amount of just knowledge and knowhow,

and you literally need to be up-to-date on — you know, if a

paper is published that morning, you need to be an expert

on it by that night, type of thing. So we do look for


specialists there, but otherwise, we are looking for people

who just can really understand how to do data science

across all these domains. And it’s a challenge to find these

people, but they’re definitely out there. And what we do like

to do is take on some interns and work with them and really

teach them how to apply data science across all these

domains and then we kind of grow them into consultants

one day. So, you either hire really, really good people, or like

anybody else, you take people who have a lot of great raw

skills and you grow them in a couple of months or a year

into really great candidates.

Kirill: Gotcha. And let’s talk about you for a second. We’ve talked

about your background. At the same time, what I wanted to

ask is, among our listeners, about 10% are either executives

or business owners or self-employed. It’s a very interesting

example. You’re running a data science consulting firm.

What are some of the challenges you face on a daily basis?

Mike: My challenges as a consultant or running a consulting

company, is finding and landing clients. That’s the biggest

challenge.

Kirill: Wouldn’t it be the case that more and more companies are

picking up data science, realizing that value, and they’re just

running to you? Wouldn’t you have a huge a backlog of

clients? That’s the way I imagine the world right now.

Mike: Yes. However, we’re not a household name like McKinsey. So

there’s still the outreach, there’s that outbound marketing

that has to happen, to go out and talk to people and get

people to speak with you. But that’s the challenge of any

consulting company. Then you have five other people


bidding on a project. How do you prove that you’re better?

Those are the types of challenges.

The other challenge, the flip of that is, as you’ve seen, we get

all very diverse sets of projects. Being able to solve all of

these is sometimes a challenge in itself, because one day

you’re solving cancer, the next day you’re predicting horses

and the next day you’re doing things from a marketing

perspective and understanding complete user population

statistics or understanding customer bases. So, it’s staying

up on the technology that is coming through so quickly such

that I can have the conversations with the other business

owners and convince them that we are the firm to go with.

Because if we are not up-to-date, there is going to be

somebody else out there that is going to say better words or

understand the newer technology and they’re going to scoop

us. So staying up-to-date and basically convincing people to

work with us.

Kirill: Gotcha. And in terms of managing the team, do you face any

challenges there? Obviously you have some very talented

people there, but at the same time, as I understand it, it’s a

decentralized company. What are some of the problems that

you’re facing there?

Mike: You know, we actually do a really good job of not having too

many internal issues. It’s very rare that we have any issues

truly internally. Sometimes a client will be misinformed or

they won’t really understand something and then we’ll have

to stick our heads out there and talk them through that. But

it’s very rare, even though we are very disaggregated across

the country, that we do have any internal problems. Most of


us have known each other for a long time, we have a really

great relationship, and everybody is just really happy,

they’re working on really fun and hard problems, so there’s

really not too many issues, to be honest with you.

Kirill: That’s really cool. It’s good to hear because for somebody

working with you guys, if you have everything down pat

inside the organization, then your outputs are going to be

superb. There’s not going to be delays, there’s not going to

be any issues, the team is working nicely together. I think

that’s a great state to be in to achieve for a company, that

everybody is on the same page inside the team itself, and

then you worry about the external issues. That comes next.

Mike: Yeah, absolutely.

Kirill: Okay, cool. Thanks a lot for that. Let’s talk a bit more about

the tools that you use, the tools and methodologies. We’ve

already mentioned a couple of methodologies like deep

learning, natural language processing, things that you use

in your company, but how about the tools? What are some

of the most popular tools that you guys rely on? Do you use

open source software? Do you use commercial software and

things like that?

Mike: Yeah. So, everything we use is open source unless our client

has already bought something and they want us to use that.

But that’s rare. That doesn’t happen too much. So often

everything is open source. I don’t do much coding anymore,

but the team’s go-to for solving more basic problems —

XGBoost is a wonderful algorithm open source tool that lets

you solve a lot of easy to difficult types of problems. We’ve

migrated a lot to using TensorFlow. I think they really


capitalized and made deep learning the sexiness that it is

today. TensorFlow did a really nice job about marketing that.

So we have a lot of clients that ask for TensorFlow, which is

fine, we’re experts in that as well. So TensorFlow, XGBoost,

we write a lot of our code in Python, if it needs to be scaled

we use Spark or some Scala or PySpark type of derivative,

you know, the very kind of basic toolsets. If there’s large

engineering tasks, we’ll use Kafka or some kind of streaming

software, you know, the basic big data architectures. So,

we’re pretty standard and up-to-date with our software tools.

Kirill: That’s really cool. And I’m glad you mentioned Spark and

Scala. That means you guys are on top of Hadoop as well,

right?


Kirill: So, walk me through this. When a client comes into your

organization, do they know their problem already or does it

take time to help identify the problem?

Mike: Every client is phenomenally different. Let’s take a client

that needs the full spectrum of services. If you do it this

way, there’s four very distinct steps that we go through. And

each client is different, some need us for only one or the

other. But let’s pretend that you needed all of them. The first

thing we hit on is an overarching data strategy, so really

getting in and understanding what are your business

questions, what are your objectives, how are you leveraging

data, where are you today, and where do you want to be in a

year or two from now. So, that really traditional strategy –

get on the whiteboard and write stuff down. That’s really

important, because at this point you have to convince the


stakeholders, you have to talk to management, you have to

talk to data governance and set all those things up.

Once you’ve done that, you’ve defined basically what you’re

going to do, what is the data science going to be. It all comes

from your data strategy. So the next logical step is actually

not data science, it’s data engineering, setting up these

architectures, the Hadoop stacks, Spark, whatever it

happens to be. Getting in there and actually doing the data

engineering, integrating with the DevOps guys, the software

engineering people, whoever it happens to be. From there,

that’s when we finally get to the data science, so actually

taking the business questions and solving them

mathematically. That’s where we do all our algorithms, our

machine learning, all that kind of fun stuff.

The fourth step, which absolutely is the most important

step, is how do we make any of this consumable. And I kind

of mentioned this before. If you did all of those three prior

steps and they still can’t take that output of that model, the

probability score and the classification label, whatever

you’ve outputted, if that can’t drive a business somehow, if

your salesperson can’t use it, if your end user can’t do

anything with it, it’s a complete waste of time. So we really

strive to make sure that we make our final product

consumable and actionable such that they can drive your

business, increase your speed to market, sales and profit,

lower your R&D time, increase market shares, something.

We have to always drive at that.

So clients come to us in various stages of where they are

along that life cycle and we plug in accordingly. That’s a very


longwinded answer to say to you: Some clients have

absolutely no idea what they want, some clients have a very

firm idea and are correct, but a lot of the times they have a

very firm idea and it’s not necessarily correct, but it’s more

of a re-education of what they properly could do or want to

do.

Kirill: Gotcha. That’s really cool. Just to reiterate those steps,

we’ve got overarching data strategy, data engineering, data

science, and ‘make it consumable.’ I do like ‘make it

consumable’ a lot because I think that’s where you can

really differentiate yourselves – and you probably do – to the

bigger historical management consulting firms with

thousands of employees, because for them, a lot of the time

they have to be in very tight cost brackets and they have to

go in and they have to do this work and they have to make

sure, because they charge exorbitant fees, like, partners

charge $400, $700 an hour, sometimes more.

And at the same time, they have to really create this work.

And for them, once they’ve done whatever is in their scope,

they’re done and they’re out of there. They don’t stick

around to bring the value to the business. I’ve seen it quite a

few times where the work is done, but then the client is just

sitting there with this report or with this algorithm that’s

been ‘tailored’ to them and they don’t know what to do with

it. So I think it’s a very important step that you mention at

the end there – to make it consumable for the client and

educate them on how to use it going forward.

Mike: Yeah, you’re absolutely right. Our whole business is literally

built around that principle. And it’s not just the


management consulting companies that – not to knock

anybody – maybe lack there, but also our other source of

competition is black box type of products. We don’t need to

necessarily name them, but there are some data science

products out there where you take your data, your problem,

you kind of shove it (to be crude) into their black box and

you get out something. But the issue is the same.

So you’ve gone through steps 2 and 3, you’ve designed some

engineering, you did some data science, you tried to solve it,

but so what? You still need that next step of “Okay, where

do I go from here?” So that’s, in my opinion, where all of

these products are lacking, it’s the “So what? What do I do

now?” which I think is our huge differentiator, it’s “Fine, no

problem. We can build all these algorithms, but now let’s

take it to the next step.”

Kirill: Yeah, exactly. And we spoke about this a bit before the

podcast. You guys are doing a lot of different things. That

approach with the black box or with “Here’s your report, go

deal with it” is good enough for low hanging fruit a lot of the

time. It can solve problems. But what you guys are doing is

like top of the industry cutting edge technology, problems

that have never been solved before, and obviously you need

people’s input, obviously you need a lot of critical thinking to

get this off the ground, let alone bring value and actually

impact the bottom line of businesses or the mission of

businesses and ultimately impact the value that those

businesses bring to the world.

Mike: Yeah, you’re absolutely right. Yes, not to say we don’t solve

some of the lower hanging fruit, because I don’t want to put


that out there. I mean, we do all of that as well, but you’re

absolutely correct in the case that a lot of the problems that

we solve, maybe about half of them, are that kind of cutting

edge spectrum where without the critical thinking, without

the real kind of knowhow how this applies in a large scale

across other verticals and other domains, there’s just no way

that a generic solution is going to work there. So, yeah,

you’re absolutely correct in your point.

Kirill: Okay, cool. So, in order to accomplish these four steps that

you’ve outlined, how many people normally do you put onto

a project? Is it like one person? Is it two or three people

working on a project at the same time?

Mike: It depends. I mean you could logically go in order and have

one person doing it. A lot of the times we will have a main

person working on it and then somebody backing them up,

especially when you get into the machine learning side of

things where sometimes we’re trying different approaches

because machine learning, at the end of the day, nobody has

a strict strategy that’s going to work. Sometimes we’ll

parallelize the work efforts. Somebody will try a deep

learning solution, somebody will try a more traditional

feature engineering approach. We see which one works and

then we can converge, ensemble and get something a little

bit quicker out. So mainly one person, sometimes two,

usually is able to fit the bill.

Kirill: That’s really cool. And I just wanted to use this opportunity

of you being on the podcast and being the CEO of the

company to demonstrate to the world, or to people listening,

what actual value companies like yours can bring to an


organization, what actual value data science can bring.

Obviously, clients have an expense related to your services,

they have to pay for your services, but could you maybe

show off with an example of how what you’ve implemented,

how that has translated into revenue or addition to the

bottom line to the client, and what that was compared to the

expense they had in terms of ROI, how much more did you

bring than they had to pay in order to get your services?

Mike: Sure. So, without saying specific numbers, I can think off

the top of my head, maybe by the time I finish speaking I’ll

think of others, but off the top of my head, one simple one

was the largest retail Japanese client. They’re the largest

online retail company, and they wanted to do something

relatively simple where they said, “Okay, I want to send out

e-mail campaigns to my customers such that I can tell them

about deals going on in the site, you know, this computer or

this shoe is on sale. How do I target them a little bit better?”

We wrote some pretty standard segmentation models which

they were then able to go in and say, “Here are my high

value customers, here are the people I should go after, here

are the people who aren’t going to do anything for me,” and

they just literally sent out different e-mails to those different

groups of people. And just by doing that, the last time I

spoke to them they said that the return on value for that

was — I think they said upwards of $2 million. And that

literally took us maybe two or three weeks’ worth of effort to

do it and actually profile the clusters and something. That’s

phenomenal ROI, and that’s probably still on-going.


Other studies — for instance, these NLP pipelines that I was

talking about before, the cost saving on that is phenomenal

because what happens is, if you build a pipeline — let’s say

that it even takes you three, six, twelve months to do, which

has a decent upfront cost, that doesn’t necessarily make

these people fired, but it actually takes the job away of

multiple people who have to do this manually. So manual

annotators, manual people doing data entry. And if you

think of the salary of four or five people that were doing this

and the associated cost when they did it incorrectly because

every time you incorrectly enter a data point, that actually

costs a company roughly $100. So it’s very expensive from

all sorts of scenarios. If you look at these five people’s salary

over the course of a few years, you’re again in the millions of

dollars. So you can have really huge impacts.

Another one that I can think of is from a deep learning

perspective. We built — this company is doing real-time

human and object tracking. So from a security system they

put in front of their house, they can say, “Oh, there’s a car.

There’s a person. There’s a dog,” which is interesting, but

what it can really do is say, “That’s not just a person, that’s

dad,” or, “That’s the mailman,” or, “That’s a stranger.” So, it

starts to learn about who you are, what you are, what the

nuance is of people coming home or not coming home, and

starts to say, “Oh, this person is in the house. Send a

message to 911.”

So this technology that we developed actually launched their

entire product suite. Because without us, they’re just a

fancy camera that doesn’t really do anything. That now I

think has a market share in the $10-$15 million dollar


range or something. I don’t know the exact numbers. But

you can take technology out there that on its own wouldn’t

do — all it’s doing is recording, it’s no different than a $20

camera — and add some kind of really interesting logic on it

and all of a sudden you have a very viable product in the

market worth a lot of money. So, yeah, there’s a lot of real

ROI out there and sometimes it’s very easy to get to it and

sometimes it’s difficult.

Kirill: Yeah, that’s really cool. And those examples are great.

Thank you so much. It sounds like you guys are sometimes

adding to the essence of a company, that extra little bit to

really make it more efficient, more tailored, more targeted,

which is good for the customers as well because they get the

products that they want that they maybe didn’t even know

they wanted.

On the other hand, like with this camera example, you are

building the essence of a company. Without you, as you

said, it’s just like a fancy camera, just like a wrapper to a

company, and then you put the company inside it. You

probably in some cases feel like, “I wish I had that idea, I

could’ve built that company myself.”

Mike: I know. We’ve had that discussion as well. It’s terrible to say

it. If any of our clients are listening, I love you all dearly. But

sometimes you do have that thought, you know, “I think

that we’re doing the bulk of this work. Why don’t we just do

this?” But yeah, that is sometimes the thoughts that do

cross your mind.

Kirill: Yeah. But I think you’re doing something way more

interesting in the long run, right? You get to work on all


these different types of projects and you never know what

will come in the future. So, personally, if I were you, I

wouldn’t give that up in order to go and pursue some one

specific idea.

Mike: Yeah, you’re absolutely right.

Kirill: All right, that was really cool. Let’s talk a bit about how do

people become consultants in data science. Not necessarily

start a consulting firm, like you did, but for those of our

listeners who are looking for jobs or who are looking to

enhance their careers or maybe change their careers, if they

want to break into the space of consulting in data science or

get hired by a company like yours, what do they need to do?

Mike: I think that what has to happen — a lot of the times, I see

data scientists out there that work for a company, and these

could be very large companies or very small companies. They

get pigeonholed into solving one very small problem. They’re

really good at solving this exact problem for this exact

dataset, which is fine for that company, but it doesn’t give

them the skills to go out and really understand the entire

market.

So to get hired, in my opinion, at least for us as a

consultant, having that very general understanding of data

science, how it applies to different data types, like I

described before – texts, images, time series, marketing stuff

and then all the other stuff that falls in that large catchall,

and how do you apply methodologies to all of these, that

then is the type of candidate that I think will thrive in the

consulting environment.


And why I really like what we do — this is kind of a pitch out

there to people who are interested in consulting — is every

day you do get to work on different types of challenges. You

know, it will probably take you a couple of weeks to finish a

project, but you’re always doing something new, it’s always a

different challenge, you’re growing so much as a data

scientist by always seeing different data types and different

problems and different algorithms that your toolbox now is

exploding and you then could be the rock star because now

you’re seeing it applied to all these different domains and

verticals and the challenges and you can just knock them all

out of the park. So I think it’s a great thing to want to do

and to be. And to do it I think you just have to really

understand the field and come in with a great attitude to

want to learn and really just put in the hard work.

Kirill: Gotcha. And I can totally attest to that about the learning.

When I was back at Deloitte, that was such a steep and fast

and quick learning curve. I was grabbing so much

knowledge just because of the variety of projects you’re in –

you’re doing this, you’re doing that. And I always

recommend to students who ask me “Where do I start? How

do I get into data science?” I always say consulting. In my

opinion, consulting is the best place to start. I don’t know

about your company, it sounds like you have a pretty good

culture, but in other consulting firms you’ll be worked to the

ground, you’ll be working until 2:00 A.M. and so on, you

won’t have a life. But if you go through that training period

you will learn so much. It’s like going to university again and

getting paid for it. It’s a really cool thing.


Mike: Yeah, I actually think that analogy is perfect. It’s like going

to university and getting paid for it. That’s true. You have to

work hard, but the benefits are very, very good.

Kirill: Yeah. And one more thing on that is, there’s two reasons to

leave a consulting job. One is, if you stop growing. Like you

pointed out, Mike, if you stop growing — there’s a saying,

“Unless you’re learning, you’re dying.” So, yeah, once you

stop learning, once you get pigeonholed, or the projects stop

becoming interesting or you’re doing this one project for,

like, seven months or a year and it’s very iterative, it’s not

anything exciting, then you leave.

Or the other one is when the culture changes, when the

culture in an organization was great and then all of a

sudden the management changes or they change up your

partners or something like that, or the team that you’re

working with changes, then you stop enjoying your work.

But apart from that, if those two criteria are satisfied, then

you can stay in consulting for several years or maybe up to a

decade and actually learn so much and really become one of

the top world’s experts in certain areas of data science or

become one of the most multifaceted data scientists that

there is on this planet.

Mike: Yeah, absolutely. I couldn’t agree more with that.

Kirill: Okay. Yeah, that’s a really cool excurse and I hope a lot of

people will pick up from that and get inspired if you are

looking for opportunities joining a consulting firm or maybe

considering that for your future. Okay, so we’ve discussed

all of the box stuff. I’ve got some rapid-fire questions for you,

Mike. Are you ready for this?


Mike: Sure. I’m ready. Go ahead.

Kirill: Cool. We might have covered a few of them, so if that’s the

case just say and you can repeat your answer, but I’m sure

you’ve got lots of great examples. What is the biggest

challenge you ever had as a data scientist?

Mike: Oh, geez.

Kirill: Except for the Higgs Boson. (Laughs) You can’t use that one

again.

Mike: The biggest challenge I ever had—oh, boy, that’s not rapid-

fire because I have to think now.

Kirill: Oh, no, that’s totally cool. On my side it’s rapid-fire, for you

it’s—

Mike: So, I actually think the biggest challenge is not in solving the

problem, but it’s educating the person you’re solving it for.

So, having that conversation with somebody who owns the

data or who is going to use that algorithm. And this doesn’t

even have to be in a consulting environment. If your boss

needs to understand the output or another team needs to

consume whatever you’re doing, having that conversation

where you guys are speaking different languages and lingo

and they don’t understand what you’re saying, you don’t

understand what they’re saying, I think bridging that gap is

the real challenge that I’ve seen and that we face. Sometimes

it’s very easy to come by but sometimes it’s really

challenging if somebody really doesn’t understand what data

science is doing and why it’s important. That’s what I think

is the biggest challenge.


Kirill: Gotcha. Thank you. That’s a very, very cool answer. I think a

lot of people could be doing that better as well. Okay, you’ve

shared a few case studies with us, but what is a recent win

or the best one or the biggest one which you can share with

us that you’ve had with your organization which you are

very proud of? It doesn’t have to be a case study, it can be

anything that you had in your organization.

Mike: I think our first Fortune 500 client – that was really exciting.

Kirill: That’s awesome.

Mike: Yeah. We started off with a lot of start-ups or Series A type

of clients who were great, they wanted to give us a chance

and they saw the value in working with us. But I was really

determined to get that first logo that if somebody came to

the site they would know, like, “I know this company!” So

that was very, very exciting. That was our big one that made

me very happy and proud.

Kirill: That’s so cool. Congratulations on that.

Mike: Thanks.

Kirill: And for everybody listening, the website is sflscientific.com.

Go check it out, find out who that logo is. I’m curious now.

(Laughs) I’ve got to check it out. Okay. What is your one

most favourite thing about being a data scientist?

Mike: The diversity in projects. Like I said, every day is a different

challenge, every day is a different dataset, a different

thought process. It’s just so dynamic and it really keeps you

on your toes. If you’re bored as a data scientist, then you’re

doing something wrong. You’ve got to think harder because


there’s a lot of interesting things out there you can be

thinking about.

Kirill: I was about to say, you cannot be bored as a data scientist.

You’re definitely doing something wrong.

Mike: Right.

Kirill: All right, cool. Thanks a lot. And the question I’m very

excited about, very interested to get your opinion on because

of your diverse background both in physics and what you’re

doing now and your vision for your business, from where

you’re sitting, from what you know about the world of data

science and what you see, where do you think the field of

data science is going and what should our listeners prepare

for to be ready for the future?

Mike: Yeah, that’s an interesting question. So, I see there’s always

going to be these kind of low hanging challenges out there

where companies will evolve and they’ll want to understand

their customer profiles or look for log threats in security or

things like that. That’s fine. I see the field of data science

really driving business. Most decisions out there from every

business unit, from HR to operations to supply chain

management to your software engineering team, sooner than

later will all be driven by data science. Think about HR.

We’re now doing projects where you have an algorithm

predicting who is your proper candidate, so you don’t have

somebody calling and screening people or giving them exams

and grilling them. You literally can predict it through an

algorithm. When you think of it that way, it’s crazy. You take

these very traditional jobs and you’re starting to completely

not automate them, but augment them with some type of


data science. So I see it really just changing the culture and

the dynamics that happen inside businesses.

Kirill: Gotcha. That’s a great example on that HR example. We had

Ben Taylor from HireVue on a couple of months ago, and the

things they do there are crazy. Like, they can take a video of

an interview and apply deep learning to understand people’s

facial movements to understand when they’re confident,

when they’re lying, when they’re telling the truth, the speech

that they’re saying, analyse all of those things and then

predict a candidate. So I totally agree with you on that, it’s a

crazy, exciting world that we’re moving into.

Mike: Yeah. And I think if you’re not prepared — and it’s

interesting because we just did a project with an HR team at

another Fortune 500 company where they were very

traditional HR and now the people that we were working

with, they started learning how to write code in R, they

started learning how to do some basic regression. Because

they saw it, they saw that that’s going to be the next steps in

the future of their world and it’s really important. So, I think

if you’re not thinking about it from “Oh, it won’t affect my

job,” I think it will. If you told me your job, I could probably

tell you at least one part of it that we can tackle with some

kind of data science.

Kirill: Yeah, that’s totally right. And it’s very exciting to see when

you can influence people to start picking up these tools,

start getting excited about them, interested, and you’re kind

of driving not only these algorithms into an organization, but

also this cultural shift into an organization when people are

getting more excited by data science themselves.



Kirill: Okay, so we’re wrapping up the podcast here. It’s been very

exciting. Is there anything else that you’d like to share with

our listeners before I move on to the final questions?

Mike: Come check us out. If you want to talk about a job, come

talk to us about a job. If you want somebody to help you

solve some really easy, medium or difficult problems, come

check us out, or if you’d like to just chat. Yeah, I think that’s

a selfless pitch, sorry. I don’t know if that’s what you’re

looking for.

Kirill: No, that’s totally cool. I think that’s warranted. You’ve

shared so much value here. I think people would be

interested to definitely check out what you have to offer. And

I wanted to add to that. Even if you don’t have problems,

even if you don’t have specific issues where you think you

need data science, companies like SFL Scientific, as Mike

mentioned, can help you with the overarching data strategy.

And I think that is a very, very important part going into the

future where you understand “Okay, what is my data

strategy for the next 2, 5, 10 years?” 10 is a long shot, but

still, it’s not going to change your organization, it’s not going

to redirect your organization, but, like one of the CEOs I was

working for back in the day would say, it would tilt your

organization.

And if you look at the way this industry is going and the

rapid expansion, how quickly everything is going, even if you

have a 2, 3, 5 degree tilt in your strategy, that can take you

to a completely different place and that can mean the

difference between going out of business in 5 years and


staying in business and prospering and smashing your

competition in 5 years. So, even if you think you don’t have

challenges, talking about data strategy, people like Mike who

have been there, who have done everything, I think it can

bring huge value to an organization.

Mike: Yeah, you’re absolutely right. Thank you for saying that.

You’re phenomenally on point. Even when we do talk to

people, especially at conferences, who weren’t there looking

for somebody to help them, we just ask them very simple

questions. It becomes evident to them very quickly, even

though they haven’t thought about it or they don’t think

they have any of these challenger or potential to do things,

it’s very often a time when there is something sitting right

there that’s a quick win and they can solve very quickly from

a strategy perspective, from an actual algorithm perspective,

so there’s all sorts of opportunities out there.

Kirill: Yeah, totally. Thanks for sharing that. On that note, how

can our listeners contact you or get in touch with you or

contact you about a job that they need help with or maybe

once you guys are hiring they could get in touch or maybe

they could just follow your career? What are the best ways?

Mike: You can obviously just come to the site. You’ve said it a few

times, but we’re sflscientific.com. If you go on there, it says

you can send an e-mail for [email protected] or

[email protected]. Those come to me. I love seeing people

write to the company so I just have them come to me. I can’t

not look at them. If somebody wants to talk and I see them, I

instantly get in touch with them. So, send us an e-mail.

Otherwise you can contact us on LinkedIn or Twitter. We


have a Facebook page as well, I don’t think that one is as

maintained as the other ones, but all the very kind of

standard social media channels and outlets.

Kirill: Gotcha. And we’ll definitely add those to the show notes for

anybody who’s going to be looking for those. Thank you very

much for that. I just have one more question for you: What

is your one favourite book that you can recommend to our

listeners so that they can become better data scientists?

Mike: Sure. I think there’s two. The first one is “The Elements of

Statistical Learning.” It’s technical, but you’ll learn about the

algorithms and what’s happening, which isn’t always as

important when you’re just applying some scikit-learn model

through Python. But it really is nice to give you the

foundations of what’s there across supervised,

unsupervised, regression, classification and cross-validation.

It really goes into that in depth. When I have conversations

with people that are interviewing or that are kind of first

coming up, I probably ask questions that are very found in

that book. So, I think having a nice grasp — obviously you

don’t have to know every single word, but that’s really nice

from a technical standpoint.

The other thing that I would suggest is some kind of book

that bridges the gap between data science and business.

Because if you want to really succeed and you want to take

it to the next level, just being the person who sits at their

computer and cranks out one or two models every week or

two and doesn’t really understand what happens next with

these models, that’s not really how you’re going to push


forward in your career either as a consultant or somebody

who is doing this in a company.

Having some kind of understanding of the entire landscape,

so data science and business, what’s the implications, how

am I driving processes – that’s what’s going to make you a

superstar in consulting or in any business that you’re in.

That would be my other suggestion.

Kirill: Okay, gotcha. Thanks a lot. So, guys, if you’re ever going for

an interview with Mike, remember to read “The Elements of

Statistical Learning,” the answers to these questions are

there. Yeah, go find yourself a book on how to apply data

science in business. We’ve had lots of recommendations on

the podcast already and there’s lots of books that can help

you out with that.

On that note, thank you so much, Mike, for coming on the

show. I really appreciate your time. I think a lot of people are

going to learn from this and benefit from your case studies.

And I’m sure you’ve inspired a lot of people to at least

consider working in consulting or engaging data science

consultants. Thank you so much.

Mike: Perfect! Thank you very much. Take care, and I hope to hear

from everybody at some point.

Kirill: So there you have it. That was Mike Segala from SFL

Scientific. I hope you enjoyed this podcast. Definitely lots

and lots of value. I’m so glad Mike came onto the show. As

you can see, there’s value for those who are running a

business, who are thinking about introducing data science,

leveraging data science in their organizations. There’s value

for those of you who are building careers in data science and


want to understand better what options out there exist and

how you can explore them and how you can enhance your

careers.

Also, we had some great case studies that Mike shared. So

I’m very happy with this session, and personally my

favourite part among all of our discussions was the notion of

learning data science through consulting. This is something

that’s very dear to me because I personally went down this

path and you’ve heard me talk about this a few times. Of

course, there are different views on this and there are

different opinions, but I believe that going into consulting to

do data science is a very powerful thing, because like Mike

and I mentioned in this session, it’s like getting a degree, an

extra degree, and also getting paid for it at the same time

because you’re learning so much, you’re working really hard,

you’re putting in the effort. There’s so many diverse

problems and tools that you’re using. It really helps you get

up to speed very quickly and break into the space.

So if you do have an opportunity or if you are considering

getting into consulting, then firms like SFL Scientific or

other data science consulting firms out there are definitely a

great option. Or even large-scale management consulting

firms are great as well as long as you make sure you don’t

get pigeonholed just because they have all these processes

quite ironed out. And of course, if you are a business owner

and you’re looking for assistance, some help, some advice

with your data science, maybe challenges or just data

science strategy, then make sure to reach out to Mike,

connect with him and pick his brain about how he can help

you or what you can do about those challenges that you’re


facing. Mike is a very, very knowledgeable guy, as you can

tell.

On that note, thank you very much for being here today.

Make sure to connect with Mike on LinkedIn and follow

them on Twitter. As you can see, he’s up to lots of great

stuff. And I’m sure they occasionally hire at SFL Scientific

and you want to be at the top of the list, one of the first

people to find out when that does happen. On that note, you

can find all of the materials mentioned in this podcast at

www.superdatascience.com/65. I hope you enjoyed today’s

session and I look forward to seeing you next time. Until

then, happy analyzing.



SDS PODCAST EPISODE 65 WITH MICHAEL SEGALA...Kirill: Alright so ODSC, this is the Open Data Science...

Documents

Transcript of SDS PODCAST EPISODE 65 WITH MICHAEL SEGALA...Kirill: Alright so ODSC, this is the Open Data Science...