Machine Learning - Challenges, Learnings & Opportunities

162

Click here to load reader

Transcript of Machine Learning - Challenges, Learnings & Opportunities

Page 1: Machine Learning - Challenges, Learnings & Opportunities

challenges, learnings and opportunities

presented by imron zuhri, adit, and samudraKUDO codefest 14 May 2016

machine learning

Page 2: Machine Learning - Challenges, Learnings & Opportunities

can a machine think?

Page 3: Machine Learning - Challenges, Learnings & Opportunities

in 1996, Garry Kasparov was not afraid of a computer, and he wonthe next year, he played against a new and improved Deep Blue

and lost

Page 4: Machine Learning - Challenges, Learnings & Opportunities

this is the move that was so surprising, so un-machine-like,

that he was sure the IBM team had cheated 

Rd5

Rd1

Page 5: Machine Learning - Challenges, Learnings & Opportunities

a random move, a computer bugto kasparov, a sign of superior intelligence

Rd5

Rd1

Page 6: Machine Learning - Challenges, Learnings & Opportunities

big data analytics, is the culminationof the machine way of thinking

we can now immenselyextend our memory and computational power to helped us doing that

Page 7: Machine Learning - Challenges, Learnings & Opportunities

what is machine learning

Page 8: Machine Learning - Challenges, Learnings & Opportunities

some definitions

a (hypnotized) user’s perspectivea scientific (witchcraft) field that:researches fundamental principles from data (potions) and develops magical algorithms (spells to cast) (pascal vincent, 2015)

field of study that gives computers the ability to learn without being explicitly programmed arthur samuel (1959)

formal definitions (tom mitchell, 1998):“A machine is said to be learning IFit improves with: each experience E on specific tasks T with specific performance P

Page 9: Machine Learning - Challenges, Learnings & Opportunities

CURRENT VIEW OF ML FOUNDING DISCIPLINES

Page 10: Machine Learning - Challenges, Learnings & Opportunities

10

three niches for machine learning

data mining: using historical data to improve decisions medical records medical knowledge

software applications that are difficult to program by hand autonomous driving image classification

user modeling automatic recommender systems

source: rong jin, 2013

Page 11: Machine Learning - Challenges, Learnings & Opportunities
Page 12: Machine Learning - Challenges, Learnings & Opportunities
Page 13: Machine Learning - Challenges, Learnings & Opportunities
Page 14: Machine Learning - Challenges, Learnings & Opportunities
Page 15: Machine Learning - Challenges, Learnings & Opportunities
Page 16: Machine Learning - Challenges, Learnings & Opportunities
Page 17: Machine Learning - Challenges, Learnings & Opportunities
Page 18: Machine Learning - Challenges, Learnings & Opportunities
Page 19: Machine Learning - Challenges, Learnings & Opportunities
Page 20: Machine Learning - Challenges, Learnings & Opportunities
Page 21: Machine Learning - Challenges, Learnings & Opportunities
Page 22: Machine Learning - Challenges, Learnings & Opportunities
Page 23: Machine Learning - Challenges, Learnings & Opportunities
Page 24: Machine Learning - Challenges, Learnings & Opportunities
Page 25: Machine Learning - Challenges, Learnings & Opportunities
Page 26: Machine Learning - Challenges, Learnings & Opportunities
Page 27: Machine Learning - Challenges, Learnings & Opportunities
Page 28: Machine Learning - Challenges, Learnings & Opportunities
Page 29: Machine Learning - Challenges, Learnings & Opportunities
Page 30: Machine Learning - Challenges, Learnings & Opportunities
Page 31: Machine Learning - Challenges, Learnings & Opportunities
Page 32: Machine Learning - Challenges, Learnings & Opportunities
Page 33: Machine Learning - Challenges, Learnings & Opportunities
Page 34: Machine Learning - Challenges, Learnings & Opportunities
Page 35: Machine Learning - Challenges, Learnings & Opportunities
Page 36: Machine Learning - Challenges, Learnings & Opportunities
Page 37: Machine Learning - Challenges, Learnings & Opportunities
Page 38: Machine Learning - Challenges, Learnings & Opportunities
Page 39: Machine Learning - Challenges, Learnings & Opportunities
Page 40: Machine Learning - Challenges, Learnings & Opportunities
Page 41: Machine Learning - Challenges, Learnings & Opportunities
Page 42: Machine Learning - Challenges, Learnings & Opportunities
Page 43: Machine Learning - Challenges, Learnings & Opportunities
Page 44: Machine Learning - Challenges, Learnings & Opportunities
Page 45: Machine Learning - Challenges, Learnings & Opportunities
Page 46: Machine Learning - Challenges, Learnings & Opportunities
Page 47: Machine Learning - Challenges, Learnings & Opportunities
Page 48: Machine Learning - Challenges, Learnings & Opportunities
Page 49: Machine Learning - Challenges, Learnings & Opportunities

(some) open problems in machine learning

one-shot learningunsupervised learning reinforced learningartificial general intelligence

“most of human and animal learning is unsupervised learning. If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. We know how to make the icing and the cherry, but we don't know how to make the cake.” yan lecun

Page 50: Machine Learning - Challenges, Learnings & Opportunities

challenges in machine learning

data-related: abundant yet scattered data unstructured, noisy data offline-stored data (duh!)

resource-related: data storage space constraints computing power training time inve$$$tments

• initial investments• running costs

Page 51: Machine Learning - Challenges, Learnings & Opportunities

challenges in machine learning

methodical issues: result consistency

(i.e. accuracy) overfitting algorithm computational efficiency

miscellaneous: architectural differences/ portability issues popularity of non-open standard, vendor-

locked compute libraries/apis(rawr!)

Page 52: Machine Learning - Challenges, Learnings & Opportunities

recent breakthroughs in machine learning

deepmind atari q learner (2014)

plays 5 kinds of atari 2600 games

states: pixels in atari actions: left/right movereward: score

algorithm used:feedforward “q-learning”conv-net for unsupervised map of reward

Page 53: Machine Learning - Challenges, Learnings & Opportunities

recent breakthroughs in machine learning

the translator (2015)

real-time translations of speech from/into 7 different languages

able to run from even from resource-constrained embedded hardware (i.e. smartphones)

uses same engine that was used in microsoft cortana (creepy!)

Page 54: Machine Learning - Challenges, Learnings & Opportunities

Reinforcement Learning: DeepMind AlphaGo

google deepmind alphago (2016)

99.8% winning rate vs other algorithm

first program to defeathuman go champion

algorithm used: deep neural network monte carlo search tree

supervised learning from expert games reinforcement learning vs other alphago instances

Page 55: Machine Learning - Challenges, Learnings & Opportunities

supervised learning: random forest

deldago et. al. (2014) used 179 classifiers with 121 data sets in uci data, result:

top 5 are random forest classifier for kaggle competition, try gbm : xgboost.

Page 56: Machine Learning - Challenges, Learnings & Opportunities

supervised: deep learning

don’t be fooled, dl research improve part by part, either new kind of layer, new activation function, new non-convex optimization solver, or deeper neural net.

from rodrigo benensondeep learning accuracies ranking

Page 57: Machine Learning - Challenges, Learnings & Opportunities

supervised: deep learning

summary:

relu works better than sigmoid function for activation.

maxout works better when applied to dropconnect for activation function.

dropout layer works to fight overfitting.adagrad and adadelta works better if you don’t

want to tune optimization hyperparameter.deeper layer works: highway layer and residual

layer.

Page 58: Machine Learning - Challenges, Learnings & Opportunities

unsupervised: t-sne

t-stochastic neighbor embeddingmaaten and hinton (2008):mnist data set visualization

works best for data-viz can be used for clustering too

(if you’d bother to tweak the algo)

Page 59: Machine Learning - Challenges, Learnings & Opportunities

Given 100 and 1000 label of data, and the other unlabeled (~50.000)Try to predict 10.000 future data. ● It works! with small label data.● Now we don’t have to tell some interns or PhD student to label some

data. :)

A Rasmus, H Valpola, M Honkala, M Berglund, and T Raiko. (2015)

semi-supervised learning: ladder neural networks

Page 60: Machine Learning - Challenges, Learnings & Opportunities

collaborative filtering: restricted boltzmann machine

rbm for collaborative learning (hinton, 2008): it has been used in netflix and spotify algo. it works better than svd! correlation(svd, rbm) : -1 < c < 1

• can be assembled with svd to improve the prediction.

Page 61: Machine Learning - Challenges, Learnings & Opportunities

some advices for applied machine learning research(this competition)

preprocessing: scaling & imputationcross-validation: choose best algoshyperparameter optimization ensembling n-models: dark knowledge

Page 62: Machine Learning - Challenges, Learnings & Opportunities

raschka(2014):scaling improve prediction!

gelman(2006)do prediction for n/a data, then predict the data with noiseless biased!

data preprocessing: scaling & imputation

Page 63: Machine Learning - Challenges, Learnings & Opportunities

cross-validation: how to choose best algo?

cross-validation is a must! (tibshirani et.al 2014)

don’t overlap your cross-validation data partition!

(zhang, data robot)

Page 64: Machine Learning - Challenges, Learnings & Opportunities

hyperparameter optimization

if you want to search best hyperparamaters:do random search.random search is better than grid search(bengio, 2012)

Page 65: Machine Learning - Challenges, Learnings & Opportunities

ensembling n-models: dark knowledge

If two model give same accuracy, but low correlation of prediction output, then we can improve prediction accuracy by averaging model prediction. (Hinton, 2015)

Page 66: Machine Learning - Challenges, Learnings & Opportunities

the landscape of opportunities

Page 67: Machine Learning - Challenges, Learnings & Opportunities
Page 68: Machine Learning - Challenges, Learnings & Opportunities

Popular Big Data IndustryFinancial Services Telco Web/Media Retail Healthcare Government

• Fraud detection

• Compliance reporting

• Portfolio analysis

• Customer statements

• Wire transfer alerts

• Customer acquisition, retention, and profitability

• Subscriber data management

• Fraud analysis

• Social analysis

• Response times

• Traffic analysis

• Product affinity/bundling

• Sentiment Analysis

• Content monetization

• Advertising optimization

• Optimization of user experience/ click stream analysis

• Network optimization to support service levels

• Store operation analysis

• Customer loyalty programs

• Collaborative planning and forecasting

• Loss prevention

• Supply chain optimization

• Drug development and launch cost reduction

• Regulatory compliance

• Product quality

• Return on promotional investment

• Lowered risk of new product success

• Security/anti-terror

• Recovery Act public disclosure

• Budgetary control and management

• Educational reporting

• Asset control and assessmentEnvironment monitoring

*cisco 2013-2014

Page 69: Machine Learning - Challenges, Learnings & Opportunities
Page 70: Machine Learning - Challenges, Learnings & Opportunities
Page 71: Machine Learning - Challenges, Learnings & Opportunities

currently the biggest prescriptive analytics engine:contextual advertising

http://www.flashtalking.com/us/targeted-ads/

Page 72: Machine Learning - Challenges, Learnings & Opportunities

another one:marketplace and services recommendation engine

Page 73: Machine Learning - Challenges, Learnings & Opportunities

challenges of implementation

and

what we do with machine learning

Page 74: Machine Learning - Challenges, Learnings & Opportunities

do you follow waze instruction during the first one week?

Page 75: Machine Learning - Challenges, Learnings & Opportunities

would you buy a self-driving car that couldn’t driveitself in 99 percent of the country?

or that knew nearly nothing about parking, couldn’t be taken out in snow or heavy rain, and would drive straight over a gaping pothole?

if your answer is yes, then check out the google self-driving car, model year 2014

Page 76: Machine Learning - Challenges, Learnings & Opportunities

but

Page 77: Machine Learning - Challenges, Learnings & Opportunities

can we trust them enough?

Page 78: Machine Learning - Challenges, Learnings & Opportunities

the BIGGEST CHALLENGES in indonesia

Page 79: Machine Learning - Challenges, Learnings & Opportunities

DATA SETS

Page 80: Machine Learning - Challenges, Learnings & Opportunities

the current analytics technology

human still doing most of the process

Page 81: Machine Learning - Challenges, Learnings & Opportunities

the current challenges of big data analytics?

heterogeneous data sources,

systems and formats

time consuming

and complex data

preparation process

almost impossible

task of integrating various kind

of data

it requires experts to

analyze big and

complex data

most of the user

interactions are not

intuitive

“Before performing analytics, data scientists must first format and prepare

the raw data for analytics, often with more than 80% of the effort.”, said Intel

Corp. Research

Page 82: Machine Learning - Challenges, Learnings & Opportunities

what it would be like,if we can simplify the whole process?

????? ????

Page 83: Machine Learning - Challenges, Learnings & Opportunities

hence our visionwe believe human should not be bogged down by tedious matters.by reimagining analytics we envisioned the creation of intelligent machines,that will free human to focus on solving the world’s toughest problems.

Page 84: Machine Learning - Challenges, Learnings & Opportunities

intelligent machines that can helped us collect the massive amount of data

automatically reads and connects to any kind of data, including automatic machine to machine connections

structureddata

printedinvoices

social mediaconversation

Page 85: Machine Learning - Challenges, Learnings & Opportunities

intelligent machines that can helped us collect the massive amount of data

automatically reads and connects to any kind of data, including automatic machine to machine connections

structureddata

printedinvoices

social mediaconversation

Page 86: Machine Learning - Challenges, Learnings & Opportunities

then helped us separate the signals from the noise

automatic data quality assessments, data cleansing and data filtering

regi

mita

gundam

x-men

Page 87: Machine Learning - Challenges, Learnings & Opportunities

then helped us separate the signals from the noise

automatic data quality assessments, data cleansing and data filtering

regi

mita

gundam

Page 88: Machine Learning - Challenges, Learnings & Opportunities

complete the information and connect them all in a meaningful way

automatic data transformation, entity extraction, contextual profiling

regi

mita

gundam

Page 89: Machine Learning - Challenges, Learnings & Opportunities

complete the information and connect them all in a meaningful way

automatic data transformation, entity extraction, contextual profiling

regi

mita

gundam

batman

tom

mediatrac

Page 90: Machine Learning - Challenges, Learnings & Opportunities

complete the information and connect them all in a meaningful way

automatic data transformation, entity extraction, contextual profiling

regi

mita

gundam

batman

tom

mediatrac

Page 91: Machine Learning - Challenges, Learnings & Opportunities

and finally helped us making sense of the massively connected data

contextual search andrecommendationintelligent data discovery

gundam

batman

sith

Page 92: Machine Learning - Challenges, Learnings & Opportunities

and finally helped us making sense of the massively connected data

contextual search andrecommendationintelligent data discovery

regi

mita

gundam

batman

tom

mediatrac

gundam

batman

sith

Page 93: Machine Learning - Challenges, Learnings & Opportunities

through a highly intuitive and natural user interface

natural language interfacevoice and gesture recognition

ada berapa banyak restoran yg jual soto sepanjang jalan senopati?

Page 94: Machine Learning - Challenges, Learnings & Opportunities

Platform As A Serviceintelligent machines

knowledge based artificial intelligence

contextualsearch and

recommendation

contextual profiling and enrichment

automaticdata

integration

scalable big data infrastructure

digi

tal

telc

o

lega

l

reta

il

heal

thca

re

agri

cult

ure

Page 95: Machine Learning - Challenges, Learnings & Opportunities

knowledge based artificial intelligence

artificial general intelligence

the brainknowledge graph

the intelligencereasoning and

learning

machine learning

heuristics

unsupervised

deep learning

NLP & image

recognition

highly secureddistributed graph

database

distributed computingwith GPU acceleration

personal brain

knowledge graph

highly securedpersonal graph

database

Page 96: Machine Learning - Challenges, Learnings & Opportunities

automaticdata

integration

multi formatstructuredunstructureduncleanmissing dataunstandardizedunconnecteddifficult to analyze

cleaned and standardizedenriched and validatedconnected at granular levelanalytics ready

data

automaticdata collection

automaticdata preparation

automaticdata integration

Page 97: Machine Learning - Challenges, Learnings & Opportunities

automaticdata

integration

Page 98: Machine Learning - Challenges, Learnings & Opportunities

automaticdata

integration

Page 99: Machine Learning - Challenges, Learnings & Opportunities

automaticdata

integration

teritory management

CONFIDENTIAL for internal use only

Page 100: Machine Learning - Challenges, Learnings & Opportunities

all of our silo data will have a totally elevated value,once you connect them all in a meaningful way

Page 101: Machine Learning - Challenges, Learnings & Opportunities

are all of our current data connected yet?

Page 102: Machine Learning - Challenges, Learnings & Opportunities

Almost…

Page 103: Machine Learning - Challenges, Learnings & Opportunities

google is a humongous library index, with a smart library card search that redirects you to the original documents

Page 104: Machine Learning - Challenges, Learnings & Opportunities

facebook is a giant personal scrapbook of all your acquaintances that are currently linked by manual tagging and friends list

source:techglimpse

Page 105: Machine Learning - Challenges, Learnings & Opportunities

youtube and instagram are a huge repository of current knowledge, lifestyle and trends that are still largely unconnected

Page 106: Machine Learning - Challenges, Learnings & Opportunities

now imagine this!

Page 107: Machine Learning - Challenges, Learnings & Opportunities

when we can have intelligent machines that can connect everything, in a meaningful way…

we can start asking questions, on things we never thought possible to be asked before

Page 108: Machine Learning - Challenges, Learnings & Opportunities

can map songs across social graphs.Spotify

can give us situational data — where someone is listening to a song, when, how and even (to an extent) why.

Shazam

can help us track the growth of a song using search and streams.

YouTube

are becoming hotbeds for music discovery.Instagram & Vine

If we can connect all their data together?

Page 109: Machine Learning - Challenges, Learnings & Opportunities

or if you have a radio station, what sort of playlist that will appeal to your target audience, if we know, that a sizeable percentage of them have a hummer?

Page 110: Machine Learning - Challenges, Learnings & Opportunities

we can even predict specific combination of words, notes and beats that will increase the chance of putting the song in billboard top 40 this upcoming season.

Page 111: Machine Learning - Challenges, Learnings & Opportunities

here are some sample of hidden insightsthat we can discover from our own large repository of data,using our intelligent data integration and data discovery tools

Page 112: Machine Learning - Challenges, Learnings & Opportunities

when we integrate historical media articles with geodemographic and point of interest database we can create a model that can predict high probability of fire incidence down to street level

Page 113: Machine Learning - Challenges, Learnings & Opportunities
Page 114: Machine Learning - Challenges, Learnings & Opportunities

productivy optimizationautomatic

dataintegration

Page 115: Machine Learning - Challenges, Learnings & Opportunities

contextual profiling and enrichment

behavioral profiling

community detection

influence and networks

Page 116: Machine Learning - Challenges, Learnings & Opportunities

contextualsearch and

recommendation

auto-correction

auto-complete

contextual rank

entity recognition

synonyms

personal geo-demographic historical time/activity/

mood

instant searchnatural

language

content

collaborative

influence

trending

similarity

popular

preference

search recommendation

contextual

optimizationpredictive

potential area distribution routing

marketing channelsegmentation

prediction

Page 117: Machine Learning - Challenges, Learnings & Opportunities

contextualsearch and

recommendation

contextual auto complete

contextual auto correct

contextual entity extraction

and recommendation

Page 118: Machine Learning - Challenges, Learnings & Opportunities

contextualsearch and

recommendation

analytic dashboard

contextual personalized

pagescurrent and

predicted trends

Page 119: Machine Learning - Challenges, Learnings & Opportunities

fraud detection

Page 120: Machine Learning - Challenges, Learnings & Opportunities

lessons learned including how to scale your ML

Page 121: Machine Learning - Challenges, Learnings & Opportunities
Page 122: Machine Learning - Challenges, Learnings & Opportunities
Page 123: Machine Learning - Challenges, Learnings & Opportunities
Page 124: Machine Learning - Challenges, Learnings & Opportunities
Page 125: Machine Learning - Challenges, Learnings & Opportunities
Page 126: Machine Learning - Challenges, Learnings & Opportunities
Page 127: Machine Learning - Challenges, Learnings & Opportunities
Page 128: Machine Learning - Challenges, Learnings & Opportunities
Page 129: Machine Learning - Challenges, Learnings & Opportunities
Page 130: Machine Learning - Challenges, Learnings & Opportunities
Page 131: Machine Learning - Challenges, Learnings & Opportunities
Page 132: Machine Learning - Challenges, Learnings & Opportunities
Page 133: Machine Learning - Challenges, Learnings & Opportunities
Page 134: Machine Learning - Challenges, Learnings & Opportunities
Page 135: Machine Learning - Challenges, Learnings & Opportunities
Page 136: Machine Learning - Challenges, Learnings & Opportunities
Page 137: Machine Learning - Challenges, Learnings & Opportunities
Page 138: Machine Learning - Challenges, Learnings & Opportunities
Page 139: Machine Learning - Challenges, Learnings & Opportunities
Page 140: Machine Learning - Challenges, Learnings & Opportunities
Page 141: Machine Learning - Challenges, Learnings & Opportunities
Page 142: Machine Learning - Challenges, Learnings & Opportunities
Page 143: Machine Learning - Challenges, Learnings & Opportunities
Page 144: Machine Learning - Challenges, Learnings & Opportunities
Page 145: Machine Learning - Challenges, Learnings & Opportunities
Page 146: Machine Learning - Challenges, Learnings & Opportunities
Page 147: Machine Learning - Challenges, Learnings & Opportunities
Page 148: Machine Learning - Challenges, Learnings & Opportunities
Page 149: Machine Learning - Challenges, Learnings & Opportunities
Page 150: Machine Learning - Challenges, Learnings & Opportunities
Page 151: Machine Learning - Challenges, Learnings & Opportunities
Page 152: Machine Learning - Challenges, Learnings & Opportunities
Page 153: Machine Learning - Challenges, Learnings & Opportunities

scalability problems - outline

large scale machine learning mahout - scalable ml on hadoop jubatus – distributed online real-time ml vowpal wabbit – fast learning at yahoo/ms trident ml and storm pattern: ml on storm, yarn upcoming --- samoa: ml on s4, storm

issues in scalable distributed ml load balancing auto scaling job scheduling workflow management

data and model parallelismparameter server frameworkpeer-to-peer framework

Page 154: Machine Learning - Challenges, Learnings & Opportunities

scalability problems - outline

distributed deep learning yahoolda: scalable parallel framework in latent variable models distbelief – distributed deep learning on cluster h2o – distributed deep learning on spark adam at msr – distributed deep learning dl4j – open source for deep learning on hadoop and spark petuum – distributed machine learning singa – distributed deep learning tensorflow: google large scale distributed dl mxnet: heterogeneous distributed deep learning caffee on spark: yahoo

distributed learning and optimization proximal splitting/auxiliary coordinates; bundle (sub-gradient); shotgun: parallelized cdm (coordinate descent method) asynchronous sgd; hogwild/dogwild;

Page 155: Machine Learning - Challenges, Learnings & Opportunities

what’s next?

Page 156: Machine Learning - Challenges, Learnings & Opportunities
Page 157: Machine Learning - Challenges, Learnings & Opportunities
Page 158: Machine Learning - Challenges, Learnings & Opportunities
Page 159: Machine Learning - Challenges, Learnings & Opportunities

emerging analytics technology for automatic analytics on large dimensional data

online deep learningtopological data analysisfuzzy-rough set based data exploration systemgranular computingkernel set and spatiotemporal analysisapplied differential geometrynon axiomatic reasoning system

intelligent rule and knowledge extraction/discoverymulti agent based modelingweak signal detection and analysisbayesian networks analysisgenetic programmingself organizing neural networks

Page 160: Machine Learning - Challenges, Learnings & Opportunities

and also more humanlike user interaction and data visualization technology

eye trackingglass-free auto stereoscopytouch sensitive hologramnatural language user interfacetangible user interfacewearable gestural interfacebrain-computer interfacesensor network user interface

Page 161: Machine Learning - Challenges, Learnings & Opportunities

In the meantime

Page 162: Machine Learning - Challenges, Learnings & Opportunities

principles for the development of a complete mind:study the science of art. study the art of science.develop your senses — especially learn how to see.

realize that everything connects to everything else.Leonardo DaVinci