Hivemall LT @ Machine Learning Casual Talks #3
-
Upload
makoto-yui -
Category
Data & Analytics
-
view
2.286 -
download
2
Transcript of Hivemall LT @ Machine Learning Casual Talks #3
-
Copyright 201 Treasure Data. All Rights Reserved.
Treasure Data Inc.Research Engineer@myui
2015/04/30Machine Learning Casual Talk #3 1
Hivemall v0.3
http://myui.github.io/
-
Copyright 201 Treasure Data. All Rights Reserved.
2015/04 1ML as a Service (MLaaS)(?)
2015/03
2009/03 NAIST XML
H141
2
-
Copyright 201 Treasure Data. All Rights Reserved.
3
0
2000
4000
6000
8000
10000
12000
Aug-12
Sep-12Oct-12
Nov-12
Dec-12Jan-13
Feb-13
Mar-13
Apr-13
May-13Jun-13
Jul-13
Aug-13
Sep-13Oct-13
Nov-13
Dec-13Jan-14
Feb-14
Mar-14
Apr-14
May-14Jun-14
Jul-14
Aug-14
Sep-14Oct-14
(
)10
Series A Funding
100
GartnerCool Vendor in Big Data
10
(201410):40 10
120 1
-
Copyright 201 Treasure Data. All Rights Reserved.
100+
15
4,000
500,0001
4
-
Copyright 201 Treasure Data. All Rights Reserved.
HivemallApache Hadoop
Hadoop HDFS
MapReduce(MRv1)
Hive/PIG
Hivemall
Apache YARN
Apache TezDAG MR v2
github.com/myui/hivemall
5
-
Copyright 201 Treasure Data. All Rights Reserved.
SQL
Hivemall
Mahout
CREATE TABLE lr_model ASSELECTfeature, -- reducers perform model averaging in parallelavg(weight) as weightFROM (SELECT logress(features,label,..) as (feature,weight)FROM train) t -- map-only taskGROUP BY feature; -- shuffled to reducers
APIHiveQLAPIstableSparkunstable)
Hadoop
6
-
Copyright 201 Treasure Data. All Rights Reserved.
Hivemall v0.3
7
(/) Perceptron Passive Aggressive (PA) Confidence Weighted (CW) Adaptive Regularization of Weight
Vectors (AROW) Soft Confidence Weighted (SCW) AdaGrad+RDA
PA Regression AROW Regression AdaGrad AdaDELTA
K & Minhashb-Bit Minhash (LSH variant)KMatrix Factorization
Feature engineering Feature hashing Feature scaling (normalization, z-score) TF-IDF vectorizer
v0.35
-
Copyright 201 Treasure Data. All Rights Reserved.
8
Matrix Factorization
kP,Q
-
Copyright 201 Treasure Data. All Rights Reserved.
9
Matrix Factorization
Biased MFSGDAdagrad
-
Copyright 201 Treasure Data. All Rights Reserved.
10
Matrix Factorization
-
Copyright 201 Treasure Data. All Rights Reserved.
11
Matrix Factorization/
-
Copyright 201 Treasure Data. All Rights Reserved.
12
1
2
N
-
Copyright 201 Treasure Data. All Rights Reserved.
create table kdd10a_pa1_model1 asselect feature,cast(voted_avg(weight) as float) as weightfrom (select train_pa1(addBias(features),label,"-mix host01,host02,host03")
as (feature,weight)from kdd10a_train_x3
) t group by feature;
MIX Server
Mix server
13
-
Copyright 201 Treasure Data. All Rights Reserved.
Model updates
Async add
AVG/Argmin KLD accumulator
hash(feature) % N
Non-blocking Channel(single shared TCP connection w/ TCP keepalive)
classifiers
Mix serv.Mix serv.
Computation/training is not being blocked
MIX Server
14
-
Copyright 201 Treasure Data. All Rights Reserved.
15
Feature requirements in Treasure Data
-
Copyright 201 Treasure Data. All Rights Reserved.
16
Treasure Data/KaggleMaster/Data Scientists
[email protected]@myui