What is jubatus (short)
-
Upload
kumazaki-hiroki -
Category
Data & Analytics
-
view
183 -
download
0
Transcript of What is jubatus (short)
What is Jubatus?How it works for you?
NTT SICHiroki Kumazaki
Jubatus is…• A Distributed Online Machine-Learning framework– An OSS developped in Japan
• GPL2.0
• Distributed– Fault-Tolerance– Scale out
• Online– Fixed time computation
• Machine-Learning– More than “word count”!
Architecture• ML model is combined with feature-extractor
MachineLearningModel
FeatureExtractor
Jubatus Server
Jubatus RPC
Architecture
• Multilanguage client library– gem, pip, cpan, maven Ready!– It essentially uses a messagepack-rpc.
• So you can use OCaml, Haskell, JavaScript, Go with your own risk.
Client
Jubatus RPC
Architecture• Many ML algorithms– Classifier– Recommender– Anomaly Detection– Clustering– Regression– Graph Mining
Useful!
Classifier• Task: Classification of Datum
import sys
def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2)
if __name__ == “__main__”: print(fib(int(sys.argv[1])))
def fib(a) if a == 1 or a == 0 1 else return fib(a-1) + fib(a-2) endendif __FILE__ == $0 puts fib(ARGV[0].to_i)end
Sample Task: Classify what programming language used
It’s It’s
Classifier• Set configuration in the Jubatus server
ClassifierFreatureExtractor
"converter": { "string_types": { "bigram": { "method": "ngram", "char_num": "2" } }, "string_rules": [ { "key": "*", "type": "bigram", "sample_weight": "tf", "global_weight": "idf“ } ]}
Feature Extractor
Classifier• Configuration JSON– It does “feature vector design”– very important step for machine learning
"converter": { "string_types": { "bigram": { "method": "ngram", "char_num": "2" } }, "string_rules": [ { "key": "*", "type": "bigram", "sample_weight": "tf", "global_weight": "idf“ } ]}
setteings for extract feature from string
define function named “bigram”
original embedded function “ngram”
pass “2” to “ngram” to create “bigram”
for all dataapply “bigram”
feature weights based on tf/idfsee wikipedia/tf-idf
Classifier• Feature Extractor becomes “bigram extractor”
Classifierbigramextractor
Feature Extractor• What bigram extractor does?
bigramextractor
import sys
def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2)
if __name__ == “__main__”: print(fib(int(sys.argv[1])))
key value
im 1
mp 1
po 1
... ...
): 1
... ...
de 1
ef 1
... ...
Feature Vector
Classifier• Training model with feature vectors
key valueim 1mp 1po 1... ...): 1... ...de 1ef 1... ...
Classifier
key valuepu 1ut 1... ...{| ...|m 1m| 1{| 1en 1nd 1
key value@a 1$_ 1... ...my ...su 1ub 1us 1se 1... ...
Classifier• Set configuration in the Jubatus server
Classifier
"method" : "AROW","parameter" : { "regularization_weight" : 1.0}
Feature Extractor
bigramextractor Classifier Algorithms
• Perceptron• Passive Aggressive• Confidence Weight• Adaptive Regularization of Weights• Normal Her d
Classifier• Use model to classification task– Jubatus will find clue for classification
AROW
key valuesi 1il 1... ...{| 1... ...
It’s
Classifier• Use model to classification task– Jubatus will find clue for classification
AROW
key valuere 1): 1
... ...s[ 1... ...
It’s
Via RPC• invoke feature extraction and classification from
client via RPC
AROWbigramextractor
lang = client.classify([sourcecode])
import sys
def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2)
if __name__ == “__main__”: print(fib(int(sys.argv[1])))
key value
im 1
mp 1
po 1
... ...
): 1
... ...
de 1
ef 1
... ...
It may be
What classifier can do?• You can – estimate the topic of tweets– trash spam mail automatically– monitor server failure from syslog– estimate sentiment of user from blog post– detect malicious attack– find what feature is the best clue to classification
How to use?• see examples in
http://github.com/jubatus/jubatus-example – gender– shogun– malware classification– language detection