Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK
-
Upload
daniel-pyrathon -
Category
Technology
-
view
151 -
download
4
description
Transcript of Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK
![Page 1: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/1.jpg)
MACHINE LEARNING ASA SERVICE
MAKING SENTIMENT PREDICTIONS IN REALTIME WITH ZMQAND NLTK
![Page 2: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/2.jpg)
ABOUT ME
![Page 3: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/3.jpg)
DISSERTATION
Let's make something cool!
![Page 4: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/4.jpg)
SOCIAL MEDIA
+
MACHINELEARNING
+
API
![Page 5: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/5.jpg)
SENTIMENT ANALYSISAS A SERVICE
A STEP-BY-STEP GUIDE
![Page 6: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/6.jpg)
Fundamental Topics
Machine LearningNatural Language Processing
Overview of the platformThe process
PrepareAnalyzeTrainUseScale
![Page 7: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/7.jpg)
MACHINE LEARNINGWHAT IS MACHINE LEARNING?
A method of teaching computers to make and improvepredictions or behaviors based on some data.It allow computers to evolve behaviors based on empirical dataData can be anything
Stock market pricesSensors and motorsemail metadata
![Page 8: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/8.jpg)
SUPERVISED MACHINE LEARNINGSPAM OR HAM
![Page 9: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/9.jpg)
SUPERVISED MACHINE LEARNINGSPAM OR HAM
![Page 10: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/10.jpg)
SUPERVISED MACHINE LEARNINGSPAM OR HAM
![Page 11: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/11.jpg)
SUPERVISED MACHINE LEARNINGSPAM OR HAM
![Page 12: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/12.jpg)
SUPERVISED MACHINE LEARNINGSPAM OR HAM
![Page 13: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/13.jpg)
SUPERVISED MACHINE LEARNINGSPAM OR HAM
![Page 14: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/14.jpg)
SUPERVISED MACHINE LEARNINGSPAM OR HAM
![Page 15: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/15.jpg)
SUPERVISED MACHINE LEARNINGSPAM OR HAM
![Page 16: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/16.jpg)
NATURAL LANGUAGE PROCESSINGWHAT IS NATURAL LANGUAGE PROCESSING?
Interactions between computers and human languagesExtract information from textSome NLTK features
BigramsPart-or-speechTokenizationStemmingWordNet lookup
![Page 17: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/17.jpg)
NATURAL LANGUAGE PROCESSINGSOME NLTK FEATURES
Tokentization
Stopword Removal
>>> phrase = "I wish to buy specified products or service">>> phrase = nlp.tokenize(phrase)>>> phrase['I', 'wish', 'to', 'buy', 'specified', 'products', 'or', 'service']
>>> phrase = nlp.remove_stopwords(tokenized_phrase)>>> phrase['I', 'wish', 'buy', 'specified', 'products', 'service']
![Page 18: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/18.jpg)
SENTIMENT ANALYSIS
![Page 19: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/19.jpg)
CLASSIFYING TWITTER SENTIMENT IS HARDImproper language useSpelling mistakes160 characters to express sentimentDifferent types of english (US, UK, Pidgin)
Gr8 picutre..God bless u RT @WhatsNextInGosp: Resurrection Sunday Service @PFCNY with @Donnieradio pic.twitter.com/nOgz65cpY57:04 PM - 21 Apr 2014
Donnie McClurkin @Donnieradio
Follow
8 RETWEETS 36 FAVORITES
![Page 20: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/20.jpg)
BACK TO BUILDING OUR API.. FINALLY!
![Page 21: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/21.jpg)
CLASSIFIER3 STEPS
![Page 22: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/22.jpg)
THE DATASETSENTIMENT140
160.000 labelled tweetsCSV formatPolarity of the tweet (0 = negative, 2 = neutral, 4 = positive)The text of the tweet (Lyx is cool)
![Page 23: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/23.jpg)
FEATURE EXTRACTIONHow are we going to find features from a phrase?
"Bag of Words" representation
my_phrase = "Today was such a rainy and horrible day"
In [12]: from nltk import word_tokenize
In [13]: word_tokenize(my_phrase)Out[13]: ['Today', 'was', 'such', 'a', 'rainy', 'and', 'horrible', 'day']
![Page 24: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/24.jpg)
FEATURE EXTRACTIONCREATE A PIPELINE OF FEATURE EXTRACTORS
FORMATTER = formatting.FormatterPipeline( formatting.make_lowercase, formatting.strip_urls, formatting.strip_hashtags, formatting.strip_names, formatting.remove_repetitons, formatting.replace_html_entities, formatting.strip_nonchars, functools.partial( formatting.remove_noise, stopwords = stopwords.words('english') + ['rt'] ), functools.partial( formatting.stem_words, stemmer= nltk.stem.porter.PorterStemmer() ))
![Page 25: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/25.jpg)
FEATURE EXTRACTIONPASS THE REPRESENTATION DOWN THE PIPELINE
In [11]: feature_extractor.extract("Today was such a rainy and horrible day")Out[11]: {'day': True, 'horribl': True, 'raini': True, 'today': True}
The result is a dictionary of variable length, containing keys asfeatures and values as always True
![Page 26: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/26.jpg)
DIMENSIONALITY REDUCTIONRemove features that are common across all classes (noise)Increase performance of the classifierDecrease the size of the model, less memory usage and morespeed
![Page 27: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/27.jpg)
DIMENSIONALITY REDUCTIONCHI-SQUARE TEST
![Page 28: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/28.jpg)
DIMENSIONALITY REDUCTIONCHI-SQUARE TEST
![Page 29: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/29.jpg)
DIMENSIONALITY REDUCTIONCHI-SQUARE TEST
![Page 30: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/30.jpg)
DIMENSIONALITY REDUCTIONCHI-SQUARE TEST
![Page 31: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/31.jpg)
DIMENSIONALITY REDUCTIONCHI-SQUARE TEST
![Page 32: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/32.jpg)
NLTK gives us BigramAssocMeasures.chi_sq
DIMENSIONALITY REDUCTIONCHI-SQUARE TEST
# Calculate the number of words for each classpos_word_count = label_word_fd['pos'].N()neg_word_count = label_word_fd['neg'].N()total_word_count = pos_word_count + neg_word_count
# For each word and it's total occurancefor word, freq in word_fd.iteritems():
# Calculate a score for the positive class pos_score = BigramAssocMeasures.chi_sq(label_word_fd['pos'][word], (freq, pos_word_count), total_word_count)
# Calculate a score for the negative class neg_score = BigramAssocMeasures.chi_sq(label_word_fd['neg'][word], (freq, neg_word_count), total_word_count)
# The sum of the two will give you it's total score word_scores[word] = pos_score + neg_score
![Page 33: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/33.jpg)
TRAININGNow that we can extract features from text, we can train a
classifier. The simplest and most flexible learning algorithm fortext classification is Naive Bayes
P(label | features) = P(label) * P(features | label) / P(features)
Simple to compute = fastAssumes feature indipendence = easy to updateSupports multiclass = scalable
![Page 34: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/34.jpg)
TRAININGNLTK provides built-in components
1. Train the classifier
2. Serialize classifier for later use
3. Train once, use as much as you want
>>> from nltk.classify import NaiveBayesClassifier>>> nb_classifier = NaiveBayesClassifier.train(train_feats)... wait a lot of time>>> nb_classifier.labels()['neg', 'pos']
>>> serializer.dump(nb_classifier, file_handle)
![Page 35: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/35.jpg)
USING THE CLASSIFIER# Load the classifier from the serialized fileclassifier = pickle.loads(classifier_file.read())
# Pick a new phrasenew_phrase = "At Pycon Italy! Love the food and this speaker is so amazing"
# 1) Preprocessingfeature_vector = feature_extractor.extract(phrase)
# 2) Dimensionality reduction, best_features is our set of best wordsreduced_feature_vector = reduce_features(feature_vector, best_features)
# 3) Classify!print self.classifier.classify(reduced_feature_vector)>>> "pos"
![Page 36: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/36.jpg)
BUILDING A CLASSIFICATION API
Classifier is slow, no matter how much optimization is madeClassifier is a blocking process, API must be event-driven
![Page 37: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/37.jpg)
BUILDING A CLASSIFICATION APISCALING TOWARDS INFINITY AND BEYOND
![Page 38: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/38.jpg)
BUILDING A CLASSIFICATION APIZEROMQ
Fast, uses native socketsPromotes horizontal scalabilityLanguage-agnostic framework
![Page 39: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/39.jpg)
BUILDING A CLASSIFICATION APIZEROMQ
...socket = context.socket(zmq.REP)... while True: message = socket.recv() phrase = json.loads(message)["text"]
# 1) Feature extraction feature_vector = feature_extractor.extract(phrase)
# 2) Dimensionality reduction, best_features is our set of best words reduced_feature_vector = reduce_features(feature_vector, best_features)
# 3) Classify! result = classifier.classify(reduced_feature_vector) socket.send(json.dumps(result))
![Page 40: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/40.jpg)
DEMO
![Page 41: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/41.jpg)
POST-MORTEMReal-time sentiment analysis APIs can be implemented, andcan be scalableWhat if we use Redis instead of having serialized classifiers?Deep learning is giving very good results in NLP, let's try it!
![Page 42: Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK](https://reader034.fdocuments.net/reader034/viewer/2022042814/554e1b7cb4c90571798b498b/html5/thumbnails/42.jpg)
FINQUESTIONS