Jinchao demo v3

12
SEARCH YOUR TWEETS SEARCH LIKE A PROFESSIONAL

Transcript of Jinchao demo v3

SEARCH  YOUR  TWEETSSEARCH  LIKE  A  PROFESSIONAL

Motivation

• Twitter  represents  a  rich  flow  of  information• Lack  of  an  effective  way  to  query  the  twitter• Hard  to  monitor  interested  topics  at  real  time

Search  Tweets  Like  a  Professional

A  Real  Time  Twitter  Search  Engine  That  Allows  you  to  Search  based  on:•Keywords◦ Country◦ Language◦Negative  words

Demo(http://searchyourtweet.info:5000/input)

Keep  an  eye  on  your  interested  topic•Not  just  searching  the  historical  tweets•Express  your  interest,  we  will  keep  you  update  on  the  newest  event•More  technical  detail  on  this  later•Video  (https://youtu.be/GdRmXNfukos)

Data  pipeline

Query  Controller

Backend  Database

percolator

Logic  Layer Frontend

Searching  database

Data  Backup

Pub/Sub

PublishMatching  query

Register  query

searching

ChallengeConnect  backend  data  pipeline:◦How to connect Kafka with ElasticSearch?

◦ Try with elasticsearch-­‐river-­‐kafka plugin,notsuccessful

◦ Solution:using Logstash!◦ Advantage:

◦ Easy to use◦ Highly Scalable◦ Work with different data sources anddestinations

An  example  of  logstash and  queue  In  production   environment

ChallengePercolator:◦Use  Case:  Altering  and  monitoring  documents◦ Think  it  as  “search  in  reverse”

◦ User  register  queries  into  percolator◦ Percolator  match  incoming  documents  with  registered  queries

◦ How  to  design  the  percolator  data  pipeline?◦How  to  decouple  the  backend  database  with  frontend  server?

◦ Use  publish  /  subscribe  design  pattern

Percolator  Pipeline

PercolatorQuery  database

Twitter  database

Controller

Pub/Sub

New  incoming   tweets

publish

subscribe

Open  channel

•query_controllerwill  construct  the  percolator  query  based  on  it,  and  pass  it  to  ElasticSearch percolator.  The  query_controllerwill  also  open  an  Redis channel  for  this  topic.•Query_controllerwill  keep  fetching  the  latest  tweets  from  ElasticSearch for  every  5s  (current  setting)  and  sending  them  to  percolator  for  matching.•For  each  tweet,  percolator  will  tell  us  if  it  matches  any  registered  query.  Query_controllerwill  push  tweet  to  the  right  Redis channel  based  this  information.•In  frontend,  Flask  server  will  subscribe  to  the  Redis channel  and  receive  percolator's  update.•For  this  demo,  in  order  to  keep  frontend  UI  simple,  all  tweets  will  be  directed  to  the  default  Redis channel.  

Data  flow  of  percolator

Challenge• Real  time  update  on  frontend:

◦ How  to  keep  posting  Redis messages  from  Flask  server  to  client  at  real  time  (solved  a  very  hacky solution)

• Construct  ElasticSearch query• Fine  tuning  on  ElasticSearch (not  enough  time  to  fine  tuning  elasticsearch mapping)  

About  MeM.Math,  University  of  Waterloo◦ Field:  Statistics  and  Machine  Learning

B.S.,  University  of  Toronto◦ Field:  Applied  Mathematics

Data  Scientist  Intern,  Neon  Inc.,  San  Francisco

Back-­‐end  Model  Developer,  MetricAid Inc.,  Toronto

Strong  interest  in  Deep  Learning:  ◦ Convolutional  Network,  Recurrent  Network◦ Applying  Deep  Learning  in  NLP

Questions?

Thank  you!