Personal Shopping Assistant - A Big Data Problem

22
Smart Personal Shopping Assistant (SPA) Arvind Rapaka Sairam Bantupalli Ravindra Nath SpotDy Inc www.spotdy.com

Transcript of Personal Shopping Assistant - A Big Data Problem

Smart Personal Shopping Assistant (SPA)

Arvind Rapaka

Sairam BantupalliRavindra Nath

SpotDy Incwww.spotdy.com

Why we need?

A Personal Shopping Assistant is an occupation where people help customers by giving advice and making suggestions. They are employed by departmental Stores.

But you have a Mobile/Web Ecommerce business. How can you enable your mobile/web application turn into a smart personal shopper for your customers? Enter - Smart Personal Shopping Assistant.

Customer

StoreAssistant

Online Store

Why we need ?

I need skinny pants that girls like. My size is 32 inch waist and 34 length.

Here you go. Let me know If I should filter by price, size or brand

I like it. My price range is 40-50 dollars.

I need skinny pants that girls like. 32 inch waist, 34 length.

I like this pant. Let’s buy it.

Ok, I placed the order. You should receive your order by tomorrow. Best of luck.

Why we need ?

Can you place an order of red skinny pants that I ordered last year

Do you want the same size?

Yes

I have placed the order. You should receive your pants by tomorrow.

Process Overview

ASR Image Q&A

Knowledge Graph/ Image DB

● Speech Recognition ● Image Matching● Q & A Dialogue

ASR- Acoustic Speech RecognitionQ&A - Question and Answers Dialogue

SPA - System Call Flow

Q&A Dialogue and IR

ASR

Image Analysis Engine

Pre-computed KD

SpotDy BigAITM Platform

Image

Text

Voice

IR - Information RetrievalKD - Knowledge Graph DB

Dialogue/Action

Dialogue/Action

Components

Knowledge Database (KD)

● Build Product Knowledge Database○ Classification (LDA, Existing Taxonomy)

○ NLP Analysis (CRF, Bayesian etc ..)

○ Image Analysis/ Text Attribution (SURF)

○ Ontologies

Image Matching

● Image Analysis ○ Extract Feature (SURF Feature Extraction)

■ Find keypoints

○ Grouping Descriptors (SURF Feature Descriptor)■ Keypoints are grouped in descriptors

○ Match images in the precomputed descriptor database.

○ Post Processing

ASR

● Speech Recognition ○ Extract Feature vectors

○ Speech Decoder■ Scoring (DNN)

■ Most Likely Text from Acoustic Model (HMM/Viterbi Algorithm)

○ Pass to Q & A system

Question and Answering

● Query Analysis ○ Query Processing (Stemming, lemma, Gazetteer ..)

○ Understand User Intent (HMM )■ Navigational/Specificity

■ Initiate dialogue if necessary

○ Query POS Labelling / Entity Extraction (CRF)

○ Query Rewrite/ Retrieval/Post Processing

Q&A Personalization

● Q & A results should be personalized and

aggregated based on:

○ Past user history

○ User Geo/Demo

○ Occasions such as Christmas, Thanksgiving etc ..

SpotDy BigAITM

Query Results

Algorithms

SURF (Speeded up Robust Features)

SURF is a feature detection process to examine an image to extract features, that are unique to the objects in the image. Based on SIFT but faster.

In our case, it help in retrieving similar products based on images.

Process Involves : ○ Build Scale Space○ LoG Approximation○ Key Point Extraction○ Generate Features

LoG Approximation

● The Laplacian is a 2-D isotropic measure of the 2nd spatial derivative of an image.

● The Laplacian of an image highlights regions of rapid intensity change to detect edges.

● Uses Gaussian smoothing filter in order to reduce its sensitivity to noise due to second derivation

GMM/DNN-HMM

HMM is a generative probabilistic model that provides a framework for modelling time-varying spectral vector sequences. In our case, we use for speech recognition.

● GMM/DNN produce posterior probabilities for HMM States● predicts likelihood of observation sequence being generated by

state sequence using Viterbi Algo● Sub word HMMs concatenate to create larger word-based HMM

Observations (Feature vectors)

GMM/DNN

HMM States(Senones)

Posterior Probabilities

NLP

Knowledge Database (KD) is the key for the query processing and information retrieval

● NLP is extensively used to process unstructured data in building KD.

Algorithms:● Conditional Random Fields/Maxent for POS Tagging, Entity

Extraction, concept tagging etc.● LDA for topic Analysis and Classification

Q&A Dialogue and IR

IndexedKD

Product Catalog

Product metainfo

NLP Engine

Query Processing

Query

Indexed KD

Annotators/Filters

Results

● User Query pass goes through various annotators. Some of the few annotators include :

○ Gazetteer, Lemmatization, Stemming, POS Tagging, Entity Extraction

● Query Rewrite ● Search - Similarity (IR). Basic

Algorithms include ○ Vector Space Modelling○ BM25F

● Result Generation

Scalability

SPA - HA Architecture

Significant computing resources are required while

scaling to millions of requests in real time.

BigAITM

BigAITM is purpose built for the scalability of applications such as SPA.

● Building KD (Knowledge Database)

● Image Repository Store

● Query Processing

● Scalable Machine Learning Models

Q&A