Applying Data Science to Move Beyond Keywords for Social Analysis

Post on 15-Apr-2017

358 views 1 download

Transcript of Applying Data Science to Move Beyond Keywords for Social Analysis

Applying Data Science toMove Beyond Keywords for Social Analysis

Richard CaudleDirector

Developer Relations

Claudio WeeraratneDirectorProduct

Management

DATASIFT FORUM

RUN ON THE BANKS?

RUN ON THE BANKS?

AMBIGUITY OF NATURAL LANGUAGE

RUN ON THE BANKS?

MOVING BEYOND KEYWORDS

bank

similarity x

with-

draw

similarity y

AND

interaction.content any "rbs,lloyds,hsbc,barclays"AND interaction.content any "withdraw,close,cashpoint,atm"

CONCEPT MODELING

KEYWORD RELATIONSHIPS

CONCEPT MODEL

rbs

VECT

OR S

PACE

#rbs

runningbacks

#hsbc

OUR APPROACH• Produce a vector space where words are grouped by

their context• Context of a word is given by surrounding words• Perform unsupervised machine learning to learn topics• word2vec is a well known implementation• gensim is a Python library that simplifies word2vec

usage• Resulting model is queryable for similarity (of word

vectors)• Language-agnostic solution

LEARNING SIMILARITY

Learn to predict a word from surrounding words

"I'm heading to #rbs to close my account"

rbs

account

closerbs

account

close

hsbcbarclays

withdrawbalance

cash money

(1000's posts)

CONCEPT 'BANK'NE

URAL

NET

WOR

K

LEARNING SIMILARITY

DEMO

IMPROVED FILTERING & CLASSIFICATION

interaction.content similar "bank,hsbc:0.7"AND interaction.content similar "withdraw:0.8"

interaction.content any "rbs,lloyds,hsbc,barclays"AND interaction.content any "withdraw,close,cashpoint,atm"

CONCISEINTUITIVE

MAINTAINABLE

UP-TO-DATEHIGHER COVERAGE

ACCURACY

IMPROVING OUR PLATFORM

• Further validation of approach• Operationalization of model production

• Creation new models for different audiences• Automated updating of models

• Implementation of 'similarity' in CSDL

Q&A

LEARN MOREdatasift.com/forum

THANK YOU