SDS PODCAST EPISODE 65 WITH MICHAEL SEGALA...Kirill: Alright so ODSC, this is the Open Data Science...
Transcript of SDS PODCAST EPISODE 65 WITH MICHAEL SEGALA...Kirill: Alright so ODSC, this is the Open Data Science...
Kirill: This is episode number 65 with CEO and Co-founder at SFL
Scientific, Michael Segala.
(background music plays)
Welcome to the SuperDataScience podcast. My name is Kirill
Eremenko, data science coach and lifestyle entrepreneur.
And each week we bring you inspiring people and ideas to
help you build your successful career in data science.
Thanks for being here today and now let’s make the complex
simple.
(background music plays)
Hello and welcome everybody to the SuperDataScience
podcast. Very excited about this episode because today we've
got the CEO and Co-Founder of SFL Scientific. SFL Scientific
is a top data science consulting firm and they have worked
with huge clients such as Staples, Goodyear, and Salesforce.
So what did we talk about? Well, Mike gave a quick overview
of his background. He actually worked on the Large Hadron
Collider in CERN on the Higgs Boson problem. So you might
have heard this, this was a big thing, and still is, the Higgs
Boson, the particle that's responsible for gravity. So he was
on that huge project, which is really cool.
And then we also talked about some case studies. So you'll
find this very exciting. Mike shared some case studies from
the consulting world, and how data science can be used to
drive actual business value to businesses. So I think that
was really valuable. Also Mike talked about what it is to use
data science in your organisation. So if you're a CEO,
Executive, a business owner, an entrepreneur, you'll find a
lot of valuable insights here about data strategy and how to
think about using data in your organisation. And of course
Mike gave some tips for those of you who are building data
science careers. So you'll find some very valuable golden
nuggets here which you can apply to your career.
All in all, as you can see, everything was covered in this
podcast. Very excited to bring this to you. And without
further ado, please welcome Mike Segala, the CEO and Co-
founder at SFL Scientific.
(background music plays)
Welcome everybody to the SuperDataScience podcast. Today
I've got a very special guest on the show, Michael Segala,
who is a CEO and a Co-founder of SFL Scientific, a data
science consultancy firm. Michael, welcome to the show.
How are you going today?
Mike: Hi Kirill. Thanks, I'm good. I'm good. I'm doing great. Thank
you very much for having me on today.
Kirill: It's great to have you on. And unfortunately we didn't get a
chance to meet in person at ODSC when I was there, but at
the same time, I met with a couple of your colleagues who
were there at that time, so Daniel and Alexander. And you
guys are doing some great work, so I'm very excited to have
you on the show actually.
Mike: Great, yeah, thank you very much.
Kirill: And just to start off with, I promised you I would ask this
question. Where are you calling from?
Mike: I'm in Boston.
Kirill: And the weather is?
Mike: Yeah, it's beautiful today for the first time in about 6
months, so it is very nice to look outside and see some
warm, sunny skies.
Kirill: Fantastic, fantastic. Like I told Michael, I've never been to
London yet. But Boston feels a lot like how people describe
London, and I feel that every time it's sunny in Boston, there
should be a public holiday or something!
Mike: Not to knock on London, but I don't think we're quite as bad!
We get some nice months. London gets a little dreary. Not to
put shame on any London people listening, but Boston I
think is a bit nicer.
Kirill: Yeah, ok, that's a good start I guess. Better than London is
not a huge accomplishment. But ok, we'll give you that!
Mike: Sure.
Kirill: Alright so ODSC, this is the Open Data Science Conference.
Tell us what you guys are doing there. You guys had a whole
booth and people were coming up, talking, you were giving
stuff out, your booth was one of the most interesting ones to
chat to personally, because I learned quite a lot. You have
some very high calibre people there. Tell us what you guys
were doing at ODSC, what were you trying to achieve there.
Mike: Sure. So let me just give you roughly a few seconds about
the background, and then tell you why we're at ODSC. So at
SFL, we're a very boutique data science consulting firm. And
what we do is we work with our clients at that very
individualised professional service level to build out some
kind of unique solutions for them. So a lot of this centres
around machine learning, artificial intelligence, business
intelligence, something to get in there, understand the core
of their problem, and then start tackling it and providing
solutions for them. So at ODSC, a lot of people are there,
and they're kind of looking at what they can for their
business to drive value. That's what it's about. It's not a
theoretical exercise any more, it's a business exercise. So we
have a really nice presence at ODSC because a lot of people
are looking for education and ways to improve their business
function.
So we're basically there speaking to people, educating them,
teaching them what they could do with data science and
then of course trying to get them as customers and have
that nice rapport with them.
Kirill: Totally understand and I like that you mentioned the drive
value part of things because I feel that a few talks in the
conference were focused on that. There was this part of the
conference actually, they just launched called the CXO
Summit, which happens at the very start. And that is
specifically focused on using data science to drive value. And
I think that a lot of people don't get that part right, or even
get it wrong in the sense that they forget about – there's data
science and then there's business value. And you need to
really connect the two to make the case for data science.
And it sounds like that is exactly what you guys are doing.
Tell us a bit more. Are you guys based in Boston? Do you
only work with Boston-based clients, or do you have a
national presence in the US?
Mike: Sure. Yeah so we're a bit scattered out. So we have 3 co-
founders and we're all separated. So I'm in Boston, the
headquarters are in Boston. My other co-founder is down in
New York. And then we have our other co-founder on the
West Coast. And then we have employees scattered about.
So we're all over the US and we have clients completely
internationally. So I have clients down by you in Australia,
which is interesting because you've got to talk to them in the
middle of the night, and you see some really interesting
communications. We have people in Japan, we've had clients
from London, Germany, really all over the world. So we have
a really nice presence both locally to the Boston and New
York area and then kind of spread out all over the world.
Kirill: Ok. Gotcha. That's very interesting. It sounds like you guys
have some very high calibre people on there. Like you have a
few PhDs, you have some people with backgrounds in
engineering, statistics, and so on. How do you go about
building up this team that's so scattered and at the same
time working together to make SFL happen?
Mike: Sure. So I think it's a fun story. So the way that the 3
founders started was we were all in grad school together
down at Brown University, which is an Ivy League school
here in the US. So we were all studying particle physics at
the time. Myself and my other co-founder, we were
experimental physicists working at CERN, at the Large
Hadron Collider. Do you know CERN?
[Cross-talk]
Kirill: That's so cool. That's so cool.
Mike: Yeah so I and my other founder, we were working at the LHC
and my thesis actually was on the discovery of the Higgs
Boson, which was the –
Kirill: Oh wow! Were you part of the team?
Mike: I was. My thesis was literally on the discovery. It was just
incredible timing. It took 30 years of people putting in some
real blood, sweat and tears to build it, and then the day I
started my PhD, the thing turned on. And the day I
graduated, it turned off. I'm not exaggerating.
Kirill: That is so cool!
Mike: I mean, the timing couldn’t have been better. And then when
I got out of school, you’re hot off this big Higgs Boson Nobel
Prize discovery and I just have it on my CV. It’s so cheating
because you present this to somebody at a job interview and
they’re so impressed. Little do they know that it’s a team of
thousands and thousands of people working. But anyways,
we were all doing that together and our other founder was a
theoretical physicist at the time. And back then we were
basically data science and training. What we used to
discover the Higgs or black holes or whatever else is
fundamentally the same exact type of machine learning or
data science that’s applied in industry today.
So we were basically just training to become data scientists
before the real terminology came about. So we left academics
because sadly there’s really not too much money or
opportunities in academics anymore just because of funding
issues, and we started these professional careers as data
scientists. And to touch on one of the points you made about
business value, I think we’re still in this phase where people
are thinking about data science as a theoretical exercise
where we’re sitting down and we’re just trying to write really
cool and sophisticated algorithms to solve things, which is
great, but we really need to think about how do we take it to
the next level and what do we do with the outputs of these
models or whatever we’re building.
So we saw on the market that there’s this huge gap there
that businesses are now getting into this whole data
collection, data sourcing thing and let’s try to capitalize on
this and say, “Okay, we understand the fundamentals of
data science, but we also understand the business value
here.” So we came together, we started the company, and
then all of our technical people, they all happened to as well
be PhDs, mainly in physics and then engineering as well. So,
it’s not a culture where we demand you to be a physicist,
but from a background perspective we have a lot of that kind
of critical thinking and problem-solving skills which are
great, you know, applied to a business sense. That’s kind of
how we all got together, formed a company and then pretty
much got to where we are today.
Kirill: Gotcha. That’s very interesting. I love it. One thing I found
about people with PhDs – not everybody, of course –
sometimes there’s issues with communication, with
communicating insights, and therefore they sometimes face
challenges when working in consulting. Do you find the
same thing or do you somehow source PhDs that don’t have
that problem?
Mike: I talk to a lot of budding data scientists that come out of
school, and the problem is exactly how you’re describing it.
It’s a communication gap where they get so focused on the
academic exercise of it that they don’t really think about
what does the client care about, or what does the business
care about. So it’s great when we find somebody with the
technical skills and then the communication skills, they’re
homeruns, they’re instant hires. But a lot of the time what
we do is, we’ll work with people who have the raw technical
ability and guide them and start bringing them into calls
and meetings and seeing how to communicate properly,
such that they build that rapport themselves. But you’re
absolutely right, there is this kind of thought in the industry
that academics can’t really speak to the business, which
unfortunately is true, but we do have to bridge that gap
when we are consulting.
Kirill: Gotcha. That’s really cool. Well, it’s very exciting that you
work at a data science consulting firm, and moreover, are
the owner of a data science consulting firm. To me it just
means that you have 100% visibility of what you can expose
in the podcast and what you can’t, and I’m going to torture
you a little bit about some case studies of data science in
action. I think our listeners love that the most, when you
can actually give live examples. Could you start with a
simple case study of how you applied data science to help a
business and bring some value to that business?
Mike: Sure. So, one of the things that I’ll mention first before I
start talking about case studies, because I’m sure they’ll go
all over the place – this will make sense as I start going
through this – so what we noticed as a company was, no
matter your industry, so if you’re in pharma, healthcare,
financial services, insurance, it doesn’t matter.
Fundamentally you’re all solving the same problems.
When we look at it from a data perspective, most people are
solving things in text analytics or vision, they have an image
or there is some kind of time series pattern we’re trying to
expose, or there’s a marketing type of question. So if you
kind of think it from that perspective, all businesses are the
same and they’re solving the same problems, which is really
cool for us because then we get to work with literally all
verticals.
So, kind of coming back to your question about use cases,
and you’ll see as we talk through these I’ll mention all sorts
of random verticals, but when you think of data science it
doesn’t really matter, your vertical, as long as you can kind
of get down some of the terminology that these verticals
speak in.
So first case study: this is a bit of a boring one, I think, but
it’s actually a huge problem in so many fields. For instance,
pharma companies or finance companies, they have an
absurd amount of unstructured data. This is all text data, so
this could be — like, if you are looking at case studies or
financial reports and you’re looking to make investments or
whatever you’re trying to do from drug discovery or whatever
the use case is, you have all this information sitting there
that’s constantly coming in to you. And what you would do
as an analyst is you would literally have to comb through all
this information and start to try to extract key elements from
thousands or millions of .pdf files or Excel files or what have
you, and aggregate it into one kind of centralized
spreadsheet or something.
So one of the things that we worked pretty heavily on is
building these end-to-end natural language processing
pipelines that would automatically clean, aggregate, classify
and then extract these huge amounts of unstructured data.
So that’s one of the really interesting problems to work on
from a NLP perspective. We see a lot of this in the pharma
space, the legal space, even real estate agents are trying to
do things like this. That’s one of the more boring case
studies.
Kirill: Quick question on that. Where does that text come from in
pharmaceutical companies?
Mike: These could be things like field reports. So,
pharmacovigilance – I don’t know if you’ve ever heard of this
terminology – is basically the study of how people react to
taking drugs. It turns out that lots of people actually die
every year because they take some kind of combination of
drugs. In the U.S., we love subscribing and taking
medications that just interact really poorly with each other.
So there’s all this information out there about drug
interactions and, you know, “this did this” and “this did
that,” and you can kind of compile from just all historical
records or academic literature huge amounts of case studies
that you start doing this. If you’re a finance portfolio
company, you literally have hundreds or thousands of
investments and each one of these investments send that
data to you. So there’s always this kind of data transfer
happening out there and it’s pretty organic through most of
the organizations that they’re collecting and seeing these
data sources.
Kirill: Okay, gotcha. So, you take that text data and then you turn
it into quantified data, into numbers and so on, and then
they do something with it or do you actually deliver insights?
Mike: Right. We deliver insights. That’s the whole idea. Our whole
core mission is to not only apply the machine learning
algorithms, but allow them to get some consumption, make
it consumable and actionable. If they can’t do anything with
it, we’ve all just wasted a lot of time and money. For
instance, if you’re taking it from the fintech perspective, the
financial services, you’ve now allowed this portfolio manager
to see his investments across a thousand different
companies and start to draw insights like, “Hey, this
company is really worth investing in,” or, “I need to drop
them because I’m losing money.” There’s tons of actual
business ROI that can happen by just aggregating and
extracting tons of different unstructured text information.
Kirill: Gotcha. Okay, cool. So that’s case study number one. Let’s
move on to the next one.
Mike: All right. Number two – let’s talk about medical stuff. So,
data science is having a huge bust or boom right now in the
medical field. We work a lot with these deep learning
technologies in medicine specifically. For instance, there is
this field of study called digital histopathology, which
basically means you go to a doctor, the doctor takes some
kind of biopsy, they look at it under a microscope, and a
human sits there and says, “Do I think this is cancer or not
cancer?” It’s a very important process, you know, diagnosing
cancer is not something to take lightly, but it’s very
expensive and it’s very human-intensive.
So what we started to do is build out large scale algorithms
using these kind of really interesting state-of-the-art neural
network architectures to automatically kind of take the
cellular activity and start detecting and diagnosing things
such as cancer or other types of diagnoses purposes, like
identifying viable cells for in vitro fertilizations, or whatever
you’re trying to do, you can start applying deep learning to
that and doing it from a very precise data science way. So
that’s a really cool case study that we’re working on right
now.
Kirill: That’s really interesting and I’m really glad you mentioned
deep learning, because I find a lot of the more traditional,
bigger, larger in size consulting firms — I’m not going to
point any fingers here, but big ones with thousands of
employees — they are very slow to adapt these bleeding edge
technologies such as deep learning. Deep learning sounds
like a very sophisticated thing, something that is very hard
to put in an engagement letter for them to bring to a client
because it’s risky, you know, it’s something completely new
and they don’t know how to even wield that sword.
Whereas you guys are like, “Oh, yeah, we’re just going to
throw in some deep learning into this mix.” And why not,
right? It’s like the most powerful technology and it can get
the best accuracy out of most of the machine learning
algorithms. Why not apply that? So I’m really happy that
you mentioned that.
Mike: Yeah, you’re absolutely right. And I mean, we get asked a lot
about competition from these larger traditional management
consulting firms that now have moved to data science. And
the problems that they’re solving are those kind of low-
hanging marketing analytics problems. They really haven’t
started to dive into these more R&D-based problems. Which
is great for us, because there’s a huge market evolving there
that we can really be in the forefront of and capitalize on,
which is fun, these are really challenges and interesting
problems and there’s really not much competition, so it’s a
great space to play in.
Kirill: Yeah. It almost sounds like you guys — what you’re doing is
kind of, like, research — similar to what you’re doing in
CERN, but in business. You’re solving problems that haven’t
been solved before. How cool is that?
Mike: You’re absolutely right. It is very cool. And I’ll give you one
more case study like this. This is something that we’re going
to begin working on very soon and I’ll be a little cryptic
about it, but basically we’re going to be using deep learning
to reconstruct organs, such as your lung, that will be turned
into 3D images and then stem cells will be grown on top of
them to have fully viable organs.
Kirill: Wow! That’s insane.
Mike: Yeah. So, not too much more to say about that, but you can
just imagine the implications of taking a deep learning
exercise and turning that into an actual organ. This is the
state of the science. This is bleeding edge R&D research that
we have the privilege and the opportunities to be working on.
So it’s just incredible.
Kirill: Do you guys take people part-time? I want to come work for
you. (Laughs)
Mike: Sure. You got it.
Kirill: This is the best. This is so cool. I love it. And on a side note,
you guys are really working in health, right? You’re changing
people’s lives. I always appreciate that, when people apply
data science to impact people’s lives, because you can do it
at scale. It’s not like you just change one person’s life, which
is already an immense thing, but you’re changing
thousands, potentially millions, of people’s lives in the future
with all of these things that you’re creating and the
consulting work that you do with these companies. So hats
off to you. I think that’s a very noble thing to be doing.
Mike: Thank you.
Kirill: Okay. So, case study—we’ve done one about NLP pipelines;
case studies 2 and 2b were about deep learning. Do you
have another one for us? Case study number 3?
Mike: I have one more completely different one and this is just
because you’re in Australia.
Kirill: Okay, gotcha.
Mike: I mentioned we had an Australian client, so completely
different type of project. We had somebody who you guys call
a professional punter, who essentially, for all our U.S.
listeners, is a professional horse race better.
Kirill: Oh, that’s so cool. I didn’t even know that term.
Mike: Yeah, he’s a punter. So, I guess Australia is the world’s
largest betting country and they bet like 85% of the entire
world market. And one of the biggest things you guys bet on
is horse racing. So he was a professional horse better and he
came to us and he wanted us to build a really large set of
algorithms around predicting winners of horse races. And
it’s so different than NLP or medical things. Now we’re
talking about sports betting.
Then we get to do projects like that and you go in and that’s
just an exercise at data cleaning and really understanding
what is horse racing, what does it mean, how do you build
these features, because there’s so many nuances, there’s
jockeys and trainers and venue. It just gets crazy. So that
was a really fun project and it actually was very successful.
He was able to use this automated machine learning pipeline
that we built and I think he has something like 10-20% ROI
on his betting, which is ridiculous because it’s an automated
system. We don’t get many of those, but when we do, they’re
fun to tackle and everybody jumps for them because they’re
fun to think about.
Kirill: That’s really cool and an exciting example, completely left
field. Okay, thanks a lot for sharing those. Those are some
exciting case studies, but it sounds like you need a very
diverse set of skills for your organization to be able to deliver
on those projects. How do you find or train up your amazing
staff in order to be able to take on these challenges?
Mike: Yeah, that’s a challenge. The only specialists I look for are
specialists in the deep learning field that could do things like
those cellular type of classification problems. Because that
takes an extreme amount of just knowledge and knowhow,
and you literally need to be up-to-date on — you know, if a
paper is published that morning, you need to be an expert
on it by that night, type of thing. So we do look for
specialists there, but otherwise, we are looking for people
who just can really understand how to do data science
across all these domains. And it’s a challenge to find these
people, but they’re definitely out there. And what we do like
to do is take on some interns and work with them and really
teach them how to apply data science across all these
domains and then we kind of grow them into consultants
one day. So, you either hire really, really good people, or like
anybody else, you take people who have a lot of great raw
skills and you grow them in a couple of months or a year
into really great candidates.
Kirill: Gotcha. And let’s talk about you for a second. We’ve talked
about your background. At the same time, what I wanted to
ask is, among our listeners, about 10% are either executives
or business owners or self-employed. It’s a very interesting
example. You’re running a data science consulting firm.
What are some of the challenges you face on a daily basis?
Mike: My challenges as a consultant or running a consulting
company, is finding and landing clients. That’s the biggest
challenge.
Kirill: Wouldn’t it be the case that more and more companies are
picking up data science, realizing that value, and they’re just
running to you? Wouldn’t you have a huge a backlog of
clients? That’s the way I imagine the world right now.
Mike: Yes. However, we’re not a household name like McKinsey. So
there’s still the outreach, there’s that outbound marketing
that has to happen, to go out and talk to people and get
people to speak with you. But that’s the challenge of any
consulting company. Then you have five other people
bidding on a project. How do you prove that you’re better?
Those are the types of challenges.
The other challenge, the flip of that is, as you’ve seen, we get
all very diverse sets of projects. Being able to solve all of
these is sometimes a challenge in itself, because one day
you’re solving cancer, the next day you’re predicting horses
and the next day you’re doing things from a marketing
perspective and understanding complete user population
statistics or understanding customer bases. So, it’s staying
up on the technology that is coming through so quickly such
that I can have the conversations with the other business
owners and convince them that we are the firm to go with.
Because if we are not up-to-date, there is going to be
somebody else out there that is going to say better words or
understand the newer technology and they’re going to scoop
us. So staying up-to-date and basically convincing people to
work with us.
Kirill: Gotcha. And in terms of managing the team, do you face any
challenges there? Obviously you have some very talented
people there, but at the same time, as I understand it, it’s a
decentralized company. What are some of the problems that
you’re facing there?
Mike: You know, we actually do a really good job of not having too
many internal issues. It’s very rare that we have any issues
truly internally. Sometimes a client will be misinformed or
they won’t really understand something and then we’ll have
to stick our heads out there and talk them through that. But
it’s very rare, even though we are very disaggregated across
the country, that we do have any internal problems. Most of
us have known each other for a long time, we have a really
great relationship, and everybody is just really happy,
they’re working on really fun and hard problems, so there’s
really not too many issues, to be honest with you.
Kirill: That’s really cool. It’s good to hear because for somebody
working with you guys, if you have everything down pat
inside the organization, then your outputs are going to be
superb. There’s not going to be delays, there’s not going to
be any issues, the team is working nicely together. I think
that’s a great state to be in to achieve for a company, that
everybody is on the same page inside the team itself, and
then you worry about the external issues. That comes next.
Mike: Yeah, absolutely.
Kirill: Okay, cool. Thanks a lot for that. Let’s talk a bit more about
the tools that you use, the tools and methodologies. We’ve
already mentioned a couple of methodologies like deep
learning, natural language processing, things that you use
in your company, but how about the tools? What are some
of the most popular tools that you guys rely on? Do you use
open source software? Do you use commercial software and
things like that?
Mike: Yeah. So, everything we use is open source unless our client
has already bought something and they want us to use that.
But that’s rare. That doesn’t happen too much. So often
everything is open source. I don’t do much coding anymore,
but the team’s go-to for solving more basic problems —
XGBoost is a wonderful algorithm open source tool that lets
you solve a lot of easy to difficult types of problems. We’ve
migrated a lot to using TensorFlow. I think they really
capitalized and made deep learning the sexiness that it is
today. TensorFlow did a really nice job about marketing that.
So we have a lot of clients that ask for TensorFlow, which is
fine, we’re experts in that as well. So TensorFlow, XGBoost,
we write a lot of our code in Python, if it needs to be scaled
we use Spark or some Scala or PySpark type of derivative,
you know, the very kind of basic toolsets. If there’s large
engineering tasks, we’ll use Kafka or some kind of streaming
software, you know, the basic big data architectures. So,
we’re pretty standard and up-to-date with our software tools.
Kirill: That’s really cool. And I’m glad you mentioned Spark and
Scala. That means you guys are on top of Hadoop as well,
right?
Mike: Yeah, absolutely.
Kirill: So, walk me through this. When a client comes into your
organization, do they know their problem already or does it
take time to help identify the problem?
Mike: Every client is phenomenally different. Let’s take a client
that needs the full spectrum of services. If you do it this
way, there’s four very distinct steps that we go through. And
each client is different, some need us for only one or the
other. But let’s pretend that you needed all of them. The first
thing we hit on is an overarching data strategy, so really
getting in and understanding what are your business
questions, what are your objectives, how are you leveraging
data, where are you today, and where do you want to be in a
year or two from now. So, that really traditional strategy –
get on the whiteboard and write stuff down. That’s really
important, because at this point you have to convince the
stakeholders, you have to talk to management, you have to
talk to data governance and set all those things up.
Once you’ve done that, you’ve defined basically what you’re
going to do, what is the data science going to be. It all comes
from your data strategy. So the next logical step is actually
not data science, it’s data engineering, setting up these
architectures, the Hadoop stacks, Spark, whatever it
happens to be. Getting in there and actually doing the data
engineering, integrating with the DevOps guys, the software
engineering people, whoever it happens to be. From there,
that’s when we finally get to the data science, so actually
taking the business questions and solving them
mathematically. That’s where we do all our algorithms, our
machine learning, all that kind of fun stuff.
The fourth step, which absolutely is the most important
step, is how do we make any of this consumable. And I kind
of mentioned this before. If you did all of those three prior
steps and they still can’t take that output of that model, the
probability score and the classification label, whatever
you’ve outputted, if that can’t drive a business somehow, if
your salesperson can’t use it, if your end user can’t do
anything with it, it’s a complete waste of time. So we really
strive to make sure that we make our final product
consumable and actionable such that they can drive your
business, increase your speed to market, sales and profit,
lower your R&D time, increase market shares, something.
We have to always drive at that.
So clients come to us in various stages of where they are
along that life cycle and we plug in accordingly. That’s a very
longwinded answer to say to you: Some clients have
absolutely no idea what they want, some clients have a very
firm idea and are correct, but a lot of the times they have a
very firm idea and it’s not necessarily correct, but it’s more
of a re-education of what they properly could do or want to
do.
Kirill: Gotcha. That’s really cool. Just to reiterate those steps,
we’ve got overarching data strategy, data engineering, data
science, and ‘make it consumable.’ I do like ‘make it
consumable’ a lot because I think that’s where you can
really differentiate yourselves – and you probably do – to the
bigger historical management consulting firms with
thousands of employees, because for them, a lot of the time
they have to be in very tight cost brackets and they have to
go in and they have to do this work and they have to make
sure, because they charge exorbitant fees, like, partners
charge $400, $700 an hour, sometimes more.
And at the same time, they have to really create this work.
And for them, once they’ve done whatever is in their scope,
they’re done and they’re out of there. They don’t stick
around to bring the value to the business. I’ve seen it quite a
few times where the work is done, but then the client is just
sitting there with this report or with this algorithm that’s
been ‘tailored’ to them and they don’t know what to do with
it. So I think it’s a very important step that you mention at
the end there – to make it consumable for the client and
educate them on how to use it going forward.
Mike: Yeah, you’re absolutely right. Our whole business is literally
built around that principle. And it’s not just the
management consulting companies that – not to knock
anybody – maybe lack there, but also our other source of
competition is black box type of products. We don’t need to
necessarily name them, but there are some data science
products out there where you take your data, your problem,
you kind of shove it (to be crude) into their black box and
you get out something. But the issue is the same.
So you’ve gone through steps 2 and 3, you’ve designed some
engineering, you did some data science, you tried to solve it,
but so what? You still need that next step of “Okay, where
do I go from here?” So that’s, in my opinion, where all of
these products are lacking, it’s the “So what? What do I do
now?” which I think is our huge differentiator, it’s “Fine, no
problem. We can build all these algorithms, but now let’s
take it to the next step.”
Kirill: Yeah, exactly. And we spoke about this a bit before the
podcast. You guys are doing a lot of different things. That
approach with the black box or with “Here’s your report, go
deal with it” is good enough for low hanging fruit a lot of the
time. It can solve problems. But what you guys are doing is
like top of the industry cutting edge technology, problems
that have never been solved before, and obviously you need
people’s input, obviously you need a lot of critical thinking to
get this off the ground, let alone bring value and actually
impact the bottom line of businesses or the mission of
businesses and ultimately impact the value that those
businesses bring to the world.
Mike: Yeah, you’re absolutely right. Yes, not to say we don’t solve
some of the lower hanging fruit, because I don’t want to put
that out there. I mean, we do all of that as well, but you’re
absolutely correct in the case that a lot of the problems that
we solve, maybe about half of them, are that kind of cutting
edge spectrum where without the critical thinking, without
the real kind of knowhow how this applies in a large scale
across other verticals and other domains, there’s just no way
that a generic solution is going to work there. So, yeah,
you’re absolutely correct in your point.
Kirill: Okay, cool. So, in order to accomplish these four steps that
you’ve outlined, how many people normally do you put onto
a project? Is it like one person? Is it two or three people
working on a project at the same time?
Mike: It depends. I mean you could logically go in order and have
one person doing it. A lot of the times we will have a main
person working on it and then somebody backing them up,
especially when you get into the machine learning side of
things where sometimes we’re trying different approaches
because machine learning, at the end of the day, nobody has
a strict strategy that’s going to work. Sometimes we’ll
parallelize the work efforts. Somebody will try a deep
learning solution, somebody will try a more traditional
feature engineering approach. We see which one works and
then we can converge, ensemble and get something a little
bit quicker out. So mainly one person, sometimes two,
usually is able to fit the bill.
Kirill: That’s really cool. And I just wanted to use this opportunity
of you being on the podcast and being the CEO of the
company to demonstrate to the world, or to people listening,
what actual value companies like yours can bring to an
organization, what actual value data science can bring.
Obviously, clients have an expense related to your services,
they have to pay for your services, but could you maybe
show off with an example of how what you’ve implemented,
how that has translated into revenue or addition to the
bottom line to the client, and what that was compared to the
expense they had in terms of ROI, how much more did you
bring than they had to pay in order to get your services?
Mike: Sure. So, without saying specific numbers, I can think off
the top of my head, maybe by the time I finish speaking I’ll
think of others, but off the top of my head, one simple one
was the largest retail Japanese client. They’re the largest
online retail company, and they wanted to do something
relatively simple where they said, “Okay, I want to send out
e-mail campaigns to my customers such that I can tell them
about deals going on in the site, you know, this computer or
this shoe is on sale. How do I target them a little bit better?”
We wrote some pretty standard segmentation models which
they were then able to go in and say, “Here are my high
value customers, here are the people I should go after, here
are the people who aren’t going to do anything for me,” and
they just literally sent out different e-mails to those different
groups of people. And just by doing that, the last time I
spoke to them they said that the return on value for that
was — I think they said upwards of $2 million. And that
literally took us maybe two or three weeks’ worth of effort to
do it and actually profile the clusters and something. That’s
phenomenal ROI, and that’s probably still on-going.
Other studies — for instance, these NLP pipelines that I was
talking about before, the cost saving on that is phenomenal
because what happens is, if you build a pipeline — let’s say
that it even takes you three, six, twelve months to do, which
has a decent upfront cost, that doesn’t necessarily make
these people fired, but it actually takes the job away of
multiple people who have to do this manually. So manual
annotators, manual people doing data entry. And if you
think of the salary of four or five people that were doing this
and the associated cost when they did it incorrectly because
every time you incorrectly enter a data point, that actually
costs a company roughly $100. So it’s very expensive from
all sorts of scenarios. If you look at these five people’s salary
over the course of a few years, you’re again in the millions of
dollars. So you can have really huge impacts.
Another one that I can think of is from a deep learning
perspective. We built — this company is doing real-time
human and object tracking. So from a security system they
put in front of their house, they can say, “Oh, there’s a car.
There’s a person. There’s a dog,” which is interesting, but
what it can really do is say, “That’s not just a person, that’s
dad,” or, “That’s the mailman,” or, “That’s a stranger.” So, it
starts to learn about who you are, what you are, what the
nuance is of people coming home or not coming home, and
starts to say, “Oh, this person is in the house. Send a
message to 911.”
So this technology that we developed actually launched their
entire product suite. Because without us, they’re just a
fancy camera that doesn’t really do anything. That now I
think has a market share in the $10-$15 million dollar
range or something. I don’t know the exact numbers. But
you can take technology out there that on its own wouldn’t
do — all it’s doing is recording, it’s no different than a $20
camera — and add some kind of really interesting logic on it
and all of a sudden you have a very viable product in the
market worth a lot of money. So, yeah, there’s a lot of real
ROI out there and sometimes it’s very easy to get to it and
sometimes it’s difficult.
Kirill: Yeah, that’s really cool. And those examples are great.
Thank you so much. It sounds like you guys are sometimes
adding to the essence of a company, that extra little bit to
really make it more efficient, more tailored, more targeted,
which is good for the customers as well because they get the
products that they want that they maybe didn’t even know
they wanted.
On the other hand, like with this camera example, you are
building the essence of a company. Without you, as you
said, it’s just like a fancy camera, just like a wrapper to a
company, and then you put the company inside it. You
probably in some cases feel like, “I wish I had that idea, I
could’ve built that company myself.”
Mike: I know. We’ve had that discussion as well. It’s terrible to say
it. If any of our clients are listening, I love you all dearly. But
sometimes you do have that thought, you know, “I think
that we’re doing the bulk of this work. Why don’t we just do
this?” But yeah, that is sometimes the thoughts that do
cross your mind.
Kirill: Yeah. But I think you’re doing something way more
interesting in the long run, right? You get to work on all
these different types of projects and you never know what
will come in the future. So, personally, if I were you, I
wouldn’t give that up in order to go and pursue some one
specific idea.
Mike: Yeah, you’re absolutely right.
Kirill: All right, that was really cool. Let’s talk a bit about how do
people become consultants in data science. Not necessarily
start a consulting firm, like you did, but for those of our
listeners who are looking for jobs or who are looking to
enhance their careers or maybe change their careers, if they
want to break into the space of consulting in data science or
get hired by a company like yours, what do they need to do?
Mike: I think that what has to happen — a lot of the times, I see
data scientists out there that work for a company, and these
could be very large companies or very small companies. They
get pigeonholed into solving one very small problem. They’re
really good at solving this exact problem for this exact
dataset, which is fine for that company, but it doesn’t give
them the skills to go out and really understand the entire
market.
So to get hired, in my opinion, at least for us as a
consultant, having that very general understanding of data
science, how it applies to different data types, like I
described before – texts, images, time series, marketing stuff
and then all the other stuff that falls in that large catchall,
and how do you apply methodologies to all of these, that
then is the type of candidate that I think will thrive in the
consulting environment.
And why I really like what we do — this is kind of a pitch out
there to people who are interested in consulting — is every
day you do get to work on different types of challenges. You
know, it will probably take you a couple of weeks to finish a
project, but you’re always doing something new, it’s always a
different challenge, you’re growing so much as a data
scientist by always seeing different data types and different
problems and different algorithms that your toolbox now is
exploding and you then could be the rock star because now
you’re seeing it applied to all these different domains and
verticals and the challenges and you can just knock them all
out of the park. So I think it’s a great thing to want to do
and to be. And to do it I think you just have to really
understand the field and come in with a great attitude to
want to learn and really just put in the hard work.
Kirill: Gotcha. And I can totally attest to that about the learning.
When I was back at Deloitte, that was such a steep and fast
and quick learning curve. I was grabbing so much
knowledge just because of the variety of projects you’re in –
you’re doing this, you’re doing that. And I always
recommend to students who ask me “Where do I start? How
do I get into data science?” I always say consulting. In my
opinion, consulting is the best place to start. I don’t know
about your company, it sounds like you have a pretty good
culture, but in other consulting firms you’ll be worked to the
ground, you’ll be working until 2:00 A.M. and so on, you
won’t have a life. But if you go through that training period
you will learn so much. It’s like going to university again and
getting paid for it. It’s a really cool thing.
Mike: Yeah, I actually think that analogy is perfect. It’s like going
to university and getting paid for it. That’s true. You have to
work hard, but the benefits are very, very good.
Kirill: Yeah. And one more thing on that is, there’s two reasons to
leave a consulting job. One is, if you stop growing. Like you
pointed out, Mike, if you stop growing — there’s a saying,
“Unless you’re learning, you’re dying.” So, yeah, once you
stop learning, once you get pigeonholed, or the projects stop
becoming interesting or you’re doing this one project for,
like, seven months or a year and it’s very iterative, it’s not
anything exciting, then you leave.
Or the other one is when the culture changes, when the
culture in an organization was great and then all of a
sudden the management changes or they change up your
partners or something like that, or the team that you’re
working with changes, then you stop enjoying your work.
But apart from that, if those two criteria are satisfied, then
you can stay in consulting for several years or maybe up to a
decade and actually learn so much and really become one of
the top world’s experts in certain areas of data science or
become one of the most multifaceted data scientists that
there is on this planet.
Mike: Yeah, absolutely. I couldn’t agree more with that.
Kirill: Okay. Yeah, that’s a really cool excurse and I hope a lot of
people will pick up from that and get inspired if you are
looking for opportunities joining a consulting firm or maybe
considering that for your future. Okay, so we’ve discussed
all of the box stuff. I’ve got some rapid-fire questions for you,
Mike. Are you ready for this?
Mike: Sure. I’m ready. Go ahead.
Kirill: Cool. We might have covered a few of them, so if that’s the
case just say and you can repeat your answer, but I’m sure
you’ve got lots of great examples. What is the biggest
challenge you ever had as a data scientist?
Mike: Oh, geez.
Kirill: Except for the Higgs Boson. (Laughs) You can’t use that one
again.
Mike: The biggest challenge I ever had—oh, boy, that’s not rapid-
fire because I have to think now.
Kirill: Oh, no, that’s totally cool. On my side it’s rapid-fire, for you
it’s—
Mike: So, I actually think the biggest challenge is not in solving the
problem, but it’s educating the person you’re solving it for.
So, having that conversation with somebody who owns the
data or who is going to use that algorithm. And this doesn’t
even have to be in a consulting environment. If your boss
needs to understand the output or another team needs to
consume whatever you’re doing, having that conversation
where you guys are speaking different languages and lingo
and they don’t understand what you’re saying, you don’t
understand what they’re saying, I think bridging that gap is
the real challenge that I’ve seen and that we face. Sometimes
it’s very easy to come by but sometimes it’s really
challenging if somebody really doesn’t understand what data
science is doing and why it’s important. That’s what I think
is the biggest challenge.
Kirill: Gotcha. Thank you. That’s a very, very cool answer. I think a
lot of people could be doing that better as well. Okay, you’ve
shared a few case studies with us, but what is a recent win
or the best one or the biggest one which you can share with
us that you’ve had with your organization which you are
very proud of? It doesn’t have to be a case study, it can be
anything that you had in your organization.
Mike: I think our first Fortune 500 client – that was really exciting.
Kirill: That’s awesome.
Mike: Yeah. We started off with a lot of start-ups or Series A type
of clients who were great, they wanted to give us a chance
and they saw the value in working with us. But I was really
determined to get that first logo that if somebody came to
the site they would know, like, “I know this company!” So
that was very, very exciting. That was our big one that made
me very happy and proud.
Kirill: That’s so cool. Congratulations on that.
Mike: Thanks.
Kirill: And for everybody listening, the website is sflscientific.com.
Go check it out, find out who that logo is. I’m curious now.
(Laughs) I’ve got to check it out. Okay. What is your one
most favourite thing about being a data scientist?
Mike: The diversity in projects. Like I said, every day is a different
challenge, every day is a different dataset, a different
thought process. It’s just so dynamic and it really keeps you
on your toes. If you’re bored as a data scientist, then you’re
doing something wrong. You’ve got to think harder because
there’s a lot of interesting things out there you can be
thinking about.
Kirill: I was about to say, you cannot be bored as a data scientist.
You’re definitely doing something wrong.
Mike: Right.
Kirill: All right, cool. Thanks a lot. And the question I’m very
excited about, very interested to get your opinion on because
of your diverse background both in physics and what you’re
doing now and your vision for your business, from where
you’re sitting, from what you know about the world of data
science and what you see, where do you think the field of
data science is going and what should our listeners prepare
for to be ready for the future?
Mike: Yeah, that’s an interesting question. So, I see there’s always
going to be these kind of low hanging challenges out there
where companies will evolve and they’ll want to understand
their customer profiles or look for log threats in security or
things like that. That’s fine. I see the field of data science
really driving business. Most decisions out there from every
business unit, from HR to operations to supply chain
management to your software engineering team, sooner than
later will all be driven by data science. Think about HR.
We’re now doing projects where you have an algorithm
predicting who is your proper candidate, so you don’t have
somebody calling and screening people or giving them exams
and grilling them. You literally can predict it through an
algorithm. When you think of it that way, it’s crazy. You take
these very traditional jobs and you’re starting to completely
not automate them, but augment them with some type of
data science. So I see it really just changing the culture and
the dynamics that happen inside businesses.
Kirill: Gotcha. That’s a great example on that HR example. We had
Ben Taylor from HireVue on a couple of months ago, and the
things they do there are crazy. Like, they can take a video of
an interview and apply deep learning to understand people’s
facial movements to understand when they’re confident,
when they’re lying, when they’re telling the truth, the speech
that they’re saying, analyse all of those things and then
predict a candidate. So I totally agree with you on that, it’s a
crazy, exciting world that we’re moving into.
Mike: Yeah. And I think if you’re not prepared — and it’s
interesting because we just did a project with an HR team at
another Fortune 500 company where they were very
traditional HR and now the people that we were working
with, they started learning how to write code in R, they
started learning how to do some basic regression. Because
they saw it, they saw that that’s going to be the next steps in
the future of their world and it’s really important. So, I think
if you’re not thinking about it from “Oh, it won’t affect my
job,” I think it will. If you told me your job, I could probably
tell you at least one part of it that we can tackle with some
kind of data science.
Kirill: Yeah, that’s totally right. And it’s very exciting to see when
you can influence people to start picking up these tools,
start getting excited about them, interested, and you’re kind
of driving not only these algorithms into an organization, but
also this cultural shift into an organization when people are
getting more excited by data science themselves.
Mike: Yeah, absolutely.
Kirill: Okay, so we’re wrapping up the podcast here. It’s been very
exciting. Is there anything else that you’d like to share with
our listeners before I move on to the final questions?
Mike: Come check us out. If you want to talk about a job, come
talk to us about a job. If you want somebody to help you
solve some really easy, medium or difficult problems, come
check us out, or if you’d like to just chat. Yeah, I think that’s
a selfless pitch, sorry. I don’t know if that’s what you’re
looking for.
Kirill: No, that’s totally cool. I think that’s warranted. You’ve
shared so much value here. I think people would be
interested to definitely check out what you have to offer. And
I wanted to add to that. Even if you don’t have problems,
even if you don’t have specific issues where you think you
need data science, companies like SFL Scientific, as Mike
mentioned, can help you with the overarching data strategy.
And I think that is a very, very important part going into the
future where you understand “Okay, what is my data
strategy for the next 2, 5, 10 years?” 10 is a long shot, but
still, it’s not going to change your organization, it’s not going
to redirect your organization, but, like one of the CEOs I was
working for back in the day would say, it would tilt your
organization.
And if you look at the way this industry is going and the
rapid expansion, how quickly everything is going, even if you
have a 2, 3, 5 degree tilt in your strategy, that can take you
to a completely different place and that can mean the
difference between going out of business in 5 years and
staying in business and prospering and smashing your
competition in 5 years. So, even if you think you don’t have
challenges, talking about data strategy, people like Mike who
have been there, who have done everything, I think it can
bring huge value to an organization.
Mike: Yeah, you’re absolutely right. Thank you for saying that.
You’re phenomenally on point. Even when we do talk to
people, especially at conferences, who weren’t there looking
for somebody to help them, we just ask them very simple
questions. It becomes evident to them very quickly, even
though they haven’t thought about it or they don’t think
they have any of these challenger or potential to do things,
it’s very often a time when there is something sitting right
there that’s a quick win and they can solve very quickly from
a strategy perspective, from an actual algorithm perspective,
so there’s all sorts of opportunities out there.
Kirill: Yeah, totally. Thanks for sharing that. On that note, how
can our listeners contact you or get in touch with you or
contact you about a job that they need help with or maybe
once you guys are hiring they could get in touch or maybe
they could just follow your career? What are the best ways?
Mike: You can obviously just come to the site. You’ve said it a few
times, but we’re sflscientific.com. If you go on there, it says
you can send an e-mail for [email protected] or
[email protected]. Those come to me. I love seeing people
write to the company so I just have them come to me. I can’t
not look at them. If somebody wants to talk and I see them, I
instantly get in touch with them. So, send us an e-mail.
Otherwise you can contact us on LinkedIn or Twitter. We
have a Facebook page as well, I don’t think that one is as
maintained as the other ones, but all the very kind of
standard social media channels and outlets.
Kirill: Gotcha. And we’ll definitely add those to the show notes for
anybody who’s going to be looking for those. Thank you very
much for that. I just have one more question for you: What
is your one favourite book that you can recommend to our
listeners so that they can become better data scientists?
Mike: Sure. I think there’s two. The first one is “The Elements of
Statistical Learning.” It’s technical, but you’ll learn about the
algorithms and what’s happening, which isn’t always as
important when you’re just applying some scikit-learn model
through Python. But it really is nice to give you the
foundations of what’s there across supervised,
unsupervised, regression, classification and cross-validation.
It really goes into that in depth. When I have conversations
with people that are interviewing or that are kind of first
coming up, I probably ask questions that are very found in
that book. So, I think having a nice grasp — obviously you
don’t have to know every single word, but that’s really nice
from a technical standpoint.
The other thing that I would suggest is some kind of book
that bridges the gap between data science and business.
Because if you want to really succeed and you want to take
it to the next level, just being the person who sits at their
computer and cranks out one or two models every week or
two and doesn’t really understand what happens next with
these models, that’s not really how you’re going to push
forward in your career either as a consultant or somebody
who is doing this in a company.
Having some kind of understanding of the entire landscape,
so data science and business, what’s the implications, how
am I driving processes – that’s what’s going to make you a
superstar in consulting or in any business that you’re in.
That would be my other suggestion.
Kirill: Okay, gotcha. Thanks a lot. So, guys, if you’re ever going for
an interview with Mike, remember to read “The Elements of
Statistical Learning,” the answers to these questions are
there. Yeah, go find yourself a book on how to apply data
science in business. We’ve had lots of recommendations on
the podcast already and there’s lots of books that can help
you out with that.
On that note, thank you so much, Mike, for coming on the
show. I really appreciate your time. I think a lot of people are
going to learn from this and benefit from your case studies.
And I’m sure you’ve inspired a lot of people to at least
consider working in consulting or engaging data science
consultants. Thank you so much.
Mike: Perfect! Thank you very much. Take care, and I hope to hear
from everybody at some point.
Kirill: So there you have it. That was Mike Segala from SFL
Scientific. I hope you enjoyed this podcast. Definitely lots
and lots of value. I’m so glad Mike came onto the show. As
you can see, there’s value for those who are running a
business, who are thinking about introducing data science,
leveraging data science in their organizations. There’s value
for those of you who are building careers in data science and
want to understand better what options out there exist and
how you can explore them and how you can enhance your
careers.
Also, we had some great case studies that Mike shared. So
I’m very happy with this session, and personally my
favourite part among all of our discussions was the notion of
learning data science through consulting. This is something
that’s very dear to me because I personally went down this
path and you’ve heard me talk about this a few times. Of
course, there are different views on this and there are
different opinions, but I believe that going into consulting to
do data science is a very powerful thing, because like Mike
and I mentioned in this session, it’s like getting a degree, an
extra degree, and also getting paid for it at the same time
because you’re learning so much, you’re working really hard,
you’re putting in the effort. There’s so many diverse
problems and tools that you’re using. It really helps you get
up to speed very quickly and break into the space.
So if you do have an opportunity or if you are considering
getting into consulting, then firms like SFL Scientific or
other data science consulting firms out there are definitely a
great option. Or even large-scale management consulting
firms are great as well as long as you make sure you don’t
get pigeonholed just because they have all these processes
quite ironed out. And of course, if you are a business owner
and you’re looking for assistance, some help, some advice
with your data science, maybe challenges or just data
science strategy, then make sure to reach out to Mike,
connect with him and pick his brain about how he can help
you or what you can do about those challenges that you’re
facing. Mike is a very, very knowledgeable guy, as you can
tell.
On that note, thank you very much for being here today.
Make sure to connect with Mike on LinkedIn and follow
them on Twitter. As you can see, he’s up to lots of great
stuff. And I’m sure they occasionally hire at SFL Scientific
and you want to be at the top of the list, one of the first
people to find out when that does happen. On that note, you
can find all of the materials mentioned in this podcast at
www.superdatascience.com/65. I hope you enjoyed today’s
session and I look forward to seeing you next time. Until
then, happy analyzing.