Download - Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

Transcript
Page 1: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

Why every developer should be a data scientist

Ivan DanyliukSoftware Developer @ Status.im

Page 3: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

...there's an 87% chance Linus Torvalds hates your code

Page 4: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

"Bad programmers worry about the code. Good programmers worry about data structures and their relationships."

Page 5: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

"Show me your [code] and conceal your [data structures], and I shall continue to be mystified.

Show me your [data structures], and I won't usually need your [code]; it'll be obvious."

Fred Brooks

Page 6: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

design around the dataprograms

Page 7: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has
Page 8: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Question

• Raise your hand if you have designed your own non-standard data-structure recently?

• Raise your hand if you have implemented your own custom algorithm recently?

• Now, raise your hand if you have written a new microservice recently?

Page 9: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Our "programs" now aredistributed systems.

Page 10: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Our "hardware" isa cloud.

Page 11: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Algorithm

OutputInput

Page 12: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

OutputInput

Program

Algorithm

Algorithm

Algorithm

Alg

orith

m

Page 13: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

OutputInput

Program

Program

ProgramPr

ogra

mDistributed

System

Algorithm

Algorithm

Algorithm

Algorithm

Algo

rithm

Algo

rithm

Algo

rithm

Algo

rithm

Algorithm

Algorithm

Algorithm

Algorithm

Algo

rithm

Algo

rithm

Algo

rithm

Algo

rithm

Page 14: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

It's just different scale.

You use RPC calls instead of functions.

Page 15: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Program

Algorithm

Algorithm

Algorithm

Alg

orith

mFunction

Function

Function

Function

Page 16: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Distributed system

Program

Program

ProgramPr

ogra

mRPC

API

RPC

Queue

Page 17: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

• Inlining function vs external function call

• Local code vs external RPC call

Page 18: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Imagine network call is as cheap as a local function call - what's the difference then?

Page 19: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Algorithmia is actually already doing that

Page 20: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has
Page 21: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

package main

import ( "fmt" algorithmia "github.com/algorithmiaio/algorithmia-go")

func main() { input := 1429593869

var client = algorithmia.NewClient("YOUR_API_KEY", "") algo, _ := client.Algo("algo://ovi_mihai/TimestampToDate/0.1.0") resp, _ := algo.Pipe(input) response := resp.(*algorithmia.AlgoResponse) fmt.Println(response.Result)}

Page 22: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

design around the dataprograms

Page 23: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

design around the dataprogramsalgorithms

systems

Page 24: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Examples

Page 25: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

How Twitter refactored it's media storage system

Twitter story

Page 26: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Twitter in 2012

Backend: API, storage, resize, prepare thumbs, etc

Page 27: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Twitter in 2012

Backend: API, storage, resize, prepare thumbs, etc

• Handle user upload• Create thumbnails and

different size versions• Store images• Return on user request (view)

Viewimage

Uploadimage

Store

Resize

Page 28: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Twitter in 2012

Backend: API, storage, resize, prepare thumbs, etc

• Handle user upload• Create thumbnails and

different size versions• Store images• Return on user request (view)

Viewimage

Uploadimage

Store

Resize

The Problem:• a lot of storage space• + 6TB per day

Page 29: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Twitter

Twitter did a research on the dataand found interesting access patterns

Page 30: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Twitter

• 50% of requested images are at most 15 days old

• After 20 days, probability of image being accessed is negligibly low

Page 31: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Twitter in 2016

• Created server that can resize on the fly

• Slow, but it's a good space-time tradeoff.

Page 32: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Twitter in 2016

• Image variants kept only 20 days.

• Images older than 20 days resized on the fly.

Page 33: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Twitter in 2016

• Storage usage dropped by 4TB per day

• Twice as less of computing power

• Saved $6 million in 2015

Page 34: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

How to ignore dataOne of my former employers story

Page 35: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

The Problem

• Similar to twitter, they had users writing and reading stuff

• Stuff had to be filtered and searched

Page 36: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

The Problem

• It became slow as the data grew

• DB outages were for 2-3 days(!)

Page 37: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

We made a huge data research...

Page 38: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

• access pattern was similar to Twitter's one — most of the data was a "dead weight" after two weeks

• data was isolated and most (90%) of the data was really small — 10x10 table of strings

...and found two things:

Page 39: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Property of small numbers

• Everything is fast for the small "N"

• Linear search can outperform binary search if N is small

Page 40: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has
Page 41: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Company decided to solve problem...

...by switching to another DB

Page 42: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

MySQL → ElasticSearch

"We have a problem with DB on search...

...let's switch to another DB with 'search' word in its name."

Page 43: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has
Page 44: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

When you ignore the data — it's not a software engineering anymore

Page 45: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

How Ravelin designed fraud detection system

Ravelin story

Page 46: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Ravelin

• Ravelin does fraud detection for financial sector

• Clients make an API call to check if they allow order to proceed

• So the latency is critical here.

Page 47: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Ravelin

• They use machine learning for that

• For the machine learning they need data

• Data is a different features

Page 48: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Ravelin•But there are complex features

•They need to connect things like phone numbers, credit cards, emails, devices and vouchers

•So new people could be easily connected to known "fraudsters" with very little data.

Page 49: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Ravelin

•They studied the problem offline• It was clear they need a graph database

Page 50: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Ravelin

So, they looked at major players in Graph databases world...

Neo4jTitanCayley...but were not happy with any of them.

Page 51: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

They returned to the drawing board and asked the question:

"What data do we actually need?"

Ravelin

Page 52: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

• "What we care about is the number of people that are connected to you."

• "And if any of those people are known fraudsters."

Ravelin

Page 53: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

So they come up with the solution by using Union Find (disjoint-set) data structure

It allows you to very quickly:• find items (and sets they are in)• join sets

Ravelin

Page 54: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Union Find

3

12

74

5

8

6

9

Can very quickly do things as:• Does 2 belong to the same

set as 7?• What set 3 is in?• Merge subsets with 9 and 7

Page 55: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

• Really low memory footprint• Crazy fast:

• CreateSet - O(1)• FindSet - O(α(n))* (worst case)• MergeSet - O(α(n))* (worst case)

• Visualization: https://visualgo.net/en/ufds

Union Find

* α(n) - is an inverse Ackerman function, grows slower than log(n)

Page 56: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

type Node struct { Count int32 Parent string }

type UnionFind struct { Nodes map[string]*Node }

func (uf *UnionFind) Add(a, b string) int32 { first, second := uf.parentNodeOrNew(a), uf.parentNodeOrNew(b)

var parent *Node if first.Parent == second.Parent { return first.Count } else if first.Count > second.Count { parent = uf.setParent(first, second) } else { parent = uf.setParent(second, first) } return parent.Count }

Union Find

Page 57: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

95th is under 2ms, average is closer to 1ms

Page 58: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

• Just by analyzing the data they really needed,

they simplified system drastically.

• Neo4J is more than 1 million Lines of Code

• Less code by two or three orders of magnitude

Page 59: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

• Less code• Less bugs• Less maintenance• Smaller attack surface• Code fully owned by team• Easier to refactor and grow

Page 60: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Systems designed around the data are much simpler, than you think.

Page 61: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Is it enough just to look at data?

Page 62: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

•Our brains are really bad with data analysis

•We have a lot of cognitive biases

•We can't even intuitively grasp probabilities

Page 63: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

What's typical requests distribution?

Page 64: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Requests distributionConstant interval

Page 65: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Requests distributionConstant interval

Uniform distribution

Page 66: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Requests distributionConstant interval

Uniform distribution

Poisson distribution

Page 67: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

What's the probability of two people in this room to have birthday at the same day?

Page 68: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Birthday Paradox

0 %

10 %

20 %

30 %

40 %

50 %

60 %

70 %

80 %

90 %

100 %

5 10 20 23 30 40 50 60 70 80 90 100

100,00 %100,00 %99,99 %99,99 %99,40 %97,00 %

89,10 %

70,60 %

50,70 %

41,10 %

11,70 %

2,70 %

Page 69: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

US AirForce and averages

Page 70: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

• In late 1940s, the USAF had a serious problem: the pilots could not keep control of their planes

• Planes were crashing even in the non-war period

• Sometimes up to 17 crashes per day

Page 71: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

• Blaming pilots and training program didn't help

• Investigations confirmed that planes were ok

• But people were keep dying

Page 72: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

• They turned the attention to the design of the cabin

• It was designed in late 20s for the average pilot

• Data was taken from the massive study of soldiers during the Civil War

Page 73: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has
Page 74: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

• USAF conducted new study of 4000+ pilots

• Measured 140 different body parameters

• Checked how many pilots fit to average

Page 75: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

What % of pilots fit into averageby 10 parameters, relevant to the cabin design?

Page 76: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

0

Page 77: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

• Only by 3 metrics, just 3.5% of the pilots were "average sized"

• There was no such thing as an "average pilot"

Page 78: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

• So, USAF ordered to make cabins adjustable, to fit wide range of different pilots.

• Unexplained plane mishaps had reduced drastically

Page 79: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

But why?

Page 80: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Average in high-dimension spaces

• Our intuition is built mostly on 1 dimension

• We tend to think that average is "where the most of values are"

Page 81: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Average in high-dimension spaces

But it's only the particular case of:• 1 dimensional data• Normal or similar distribution

Page 82: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Average in high-dimension spaces

• Average is actually more like "center of the mass"

• Average value of the donut is inside the hole

• But for high dimensions everything is really messed up

Page 83: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

1D Normal Distribution

Page 84: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

2 dimensions

Page 85: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

3 dimensions

Page 86: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

• As number of dimensions grows, mass moves

from center to the perifery

• In 10 dimensions, all values are on the edges -

"curse of dimensionality"

• As some professors say "The N-dimensonal orange is all skin"

Page 87: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

• But our intuition is built upon 1-2-3 dimensions

• For many types of data, intuition is not enough,

we need math

• Knowing these properties at that time, many human deaths in USAF could have been

avoided

Page 88: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Data Science

Page 89: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Data Science

• It's an interdisciplinary field

• Math, statistics, computer science, visualization, machine learning, etc

Page 90: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has
Page 91: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

If you want to be a good software engineer, you should be passionate about data science.

Page 92: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

• Learning data science improves your understanding of complex real-world problems, after all — including politics, economy, wars and poverty.

• It inevitably boosts your intellectual curiosity.

Page 93: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Where to start?

Page 94: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Where to start?

Whatever works best for you:

- video courses

- boring textbooks

- meetups and classes

- marrying a data scientist :)

Page 95: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Where to start?

Must topics:

- basic statistics

- probabilities

- R Language / Python Pandas / Go Gonum

- basics of neural networks

- basics of linear algebra

Page 96: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Where to start?

Video courses:

- Coursera: search "data science"

- Udemy: search "data science"

- Khan Academy

- Educational Youtube channels (they're gems!)

Page 97: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Page 98: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Page 99: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Resume

1. Think about the whole system as one program

2. Always ask questions about data you work with

3. To make sense of this data, learn data science

Page 100: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

#dfua

Links

• http://highscalability.com/blog/2016/4/20/how-twitter-handles-3000-images-per-second.html

• https://skillsmatter.com/skillscasts/8355-london-go-usergroup

• https://www.thestar.com/news/insight/2016/01/16/when-us-air-force-discovered-the-flaw-of-averages.html

• https://medium.com/@charlie.b.ohara/breaking-down-big-o-notation-40963a0f4e2a

• https://www.youtube.com/watch?v=gas2v1emubU

• https://algorithmia.com/algorithms/ovi_mihai/TimestampToDate

• https://en.wikipedia.org/wiki/Disjoint-set_data_structure

Page 101: Why every developer should be a data scientist · Why every developer should be a data scientist Ivan Danyliuk Software Developer @ Status.im. #dfua The recent study from MIT has

Ivan Danyliuk@idanyliuk

Questions?Thank you!