Using AWS to Build a Graph-Based Product Recommendation System (BDT303) | AWS re:Invent 2013
-
Upload
amazon-web-services -
Category
Technology
-
view
5.751 -
download
0
description
Transcript of Using AWS to Build a Graph-Based Product Recommendation System (BDT303) | AWS re:Invent 2013
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Andre Fatala & Renato Pedigoni
November 14, 2013
Using AWS to Build a Graph-based Product Recommendation System
Friday, November 15, 13
About Magazine LuizaAbout Magazine Luiza
Magazine Luiza is one of the largest household appliance retail chains in Brazil. Focused on providing durable goods for Brazil's middle and lower-to-middle income classes.
• 731 stores• 8 distribution centers• more than 23.000 workers• 22.8 million customers• multi-channel strategy
Friday, November 15, 13
Friday, November 15, 13
Recommendation systems
Friday, November 15, 13
Recommendation systems
Friday, November 15, 13
Graphs
Friday, November 15, 13
Graph Stack
Distributed Graph Database Distributed database management system
Friday, November 15, 13
Graph Stack
• Used for OLTP queries
Distributed Graph Database Distributed database management system
Friday, November 15, 13
Graph Stack
• Used for OLTP queries• Native integration with Tinkerpop
Distributed Graph Database Distributed database management system
Friday, November 15, 13
Graph Stack
• Continuously available with no single point of failure• Used for OLTP queries• Native integration with Tinkerpop
Distributed Graph Database Distributed database management system
Friday, November 15, 13
Graph Stack
• Continuously available with no single point of failure• Elastic scalability
• Used for OLTP queries• Native integration with Tinkerpop
Distributed Graph Database Distributed database management system
Friday, November 15, 13
Graph Stack
• Continuously available with no single point of failure• Elastic scalability• Caching layer
• Used for OLTP queries• Native integration with Tinkerpop
Distributed Graph Database Distributed database management system
Friday, November 15, 13
Graph Stack
• Continuously available with no single point of failure• Elastic scalability• Caching layer• Built-in replication
• Used for OLTP queries• Native integration with Tinkerpop
Distributed Graph Database Distributed database management system
Friday, November 15, 13
Storing users data
Cassandra cluster
m2.xlarge m2.xlarge
m2.xlarge m2.xlarge
ElasticLoad Balancing
EC2instance
EC2instance
m2.xlarge m2.xlargeAuto Scaling
API instances
Friday, November 15, 13
Storing users data
Cassandra cluster
m2.xlarge m2.xlarge
m2.xlarge m2.xlarge
ElasticLoad Balancing
EC2instance
EC2instance
m2.xlarge m2.xlargeAuto Scaling
API instances
Friday, November 15, 13
In graph words…
person
Friday, November 15, 13
In graph words…
person session
Friday, November 15, 13
In graph words…
person sessioncreated
Friday, November 15, 13
In graph words…
person
channel
sessioncreated
Friday, November 15, 13
In graph words…
person
channel
sessioncreated
visited
Friday, November 15, 13
In graph words…
person
channel
session
item
created
visited
Friday, November 15, 13
In graph words…
person
channel
session
item
created
visited
viewed
Friday, November 15, 13
In graph words…
person
channel
session
item
created
visited
viewed +1
Friday, November 15, 13
In graph words…
person
channel
session
item
created
visited
+1add_to_cart
Friday, November 15, 13
In graph words…
person
channel
session
item
created
visited
+1add_to_cart +13
Friday, November 15, 13
In graph words…
person
channel
session
item
created
visited
+1+13bought
Friday, November 15, 13
In graph words…
person
channel
session
item
created
visited
+1+13bought +21
Friday, November 15, 13
Friday, November 15, 13
Friday, November 15, 13
Base recommendations
Who viewed this item also viewed
Friday, November 15, 13
Base recommendations
Who viewed this item also viewed
Friday, November 15, 13
Base recommendations
Who bought this item also bought
Friday, November 15, 13
Base recommendations
Bought after viewing this item
Friday, November 15, 13
Base recommendations
Upselling
Friday, November 15, 13
How to query the graph for recs?
Friday, November 15, 13
How to query the graph for recs?
Friday, November 15, 13
Gremlin Graph Language
Friday, November 15, 13
Gremlin Graph Language
• Groovy DSL for graph traversals
Friday, November 15, 13
Gremlin Graph Language
• Groovy DSL for graph traversals• Easy to learn
Friday, November 15, 13
Gremlin Graph Language
• Groovy DSL for graph traversals• Easy to learn• Great community
Friday, November 15, 13
Gremlin Graph Language
• Groovy DSL for graph traversals• Easy to learn• Great community• Part of the Tinkerpop stack
Friday, November 15, 13
Gremlin Graph Language
• Groovy DSL for graph traversals• Easy to learn• Great community• Part of the Tinkerpop stack• Works with any Blueprints enabled graph database
Friday, November 15, 13
People who viewed a product
LED TV 42"
Renato
Fatala
LED TV 40"
LCD TV 42"
LED50"
viewed
viewed
viewed
viewed
viewed
viewed
Friday, November 15, 13
g.v(4).in(‘viewed’)People who viewed a product
LED TV 42"
Renato
Fatala
LED TV 40"
LCD TV 42"
LED50"
viewed
viewed
viewed
viewed
viewed
viewed
Friday, November 15, 13
g.v(4).in(‘viewed’)People who viewed a product
LED TV 42"
Renato
Fatala
LED TV 40"
LCD TV 42"
LED50"
viewed
viewed
viewed
viewed
viewed
viewed
Friday, November 15, 13
g.v(4).in(‘viewed’)People who viewed a product
LED TV 42"
Renato
Fatala
LED TV 40"
LCD TV 42"
LED50"
viewed
viewed
viewed
viewed
viewed
viewed
Friday, November 15, 13
g.v(4).in(‘viewed’)People who viewed a product
LED TV 42"
Renato
Fatala
LED TV 40"
LCD TV 42"
LED50"
viewed
viewed
viewed
viewed
viewed
viewed
Friday, November 15, 13
Who viewed this product also viewed
LED TV 42"
Renato
Fatala
LED TV 40"
LCD TV 42"
LED50"
viewed
viewed
viewed
viewed
viewed
viewed
Friday, November 15, 13
Who viewed this product also viewed
g.v(4).in(‘viewed’).out(‘viewed’)
LED TV 42"
Renato
Fatala
LED TV 40"
LCD TV 42"
LED50"
viewed
viewed
viewed
viewed
viewed
viewed
Friday, November 15, 13
Who viewed this product also viewed
g.v(4).in(‘viewed’).out(‘viewed’)
LED TV 42"
Renato
Fatala
LED TV 40"
LCD TV 42"
LED50"
viewed
viewed
viewed
viewed
viewed
viewed
Friday, November 15, 13
Who viewed this product also viewed
g.v(4).in(‘viewed’).out(‘viewed’)
LED TV 42"
Renato
Fatala
LED TV 40"
LCD TV 42"
LED50"
viewed
viewed
viewed
viewed
viewed
viewed
Friday, November 15, 13
Who viewed this product also viewed
g.v(4).in(‘viewed’).out(‘viewed’)
LED TV 42"
Renato
Fatala
LED TV 40"
LCD TV 42"
LED50"
viewed
viewed
viewed
viewed
viewed
viewed
Friday, November 15, 13
Processing data with Spot Instances
Friday, November 15, 13
Bob
Simple Queue Service(Amazon SQS)
dispatch a task to Amazon SQS
containing the product id
Processing data with Spot Instances
Friday, November 15, 13
EC2instance
m1.large
EC2instance
m1.large
EC2instance
m1.large
Spot instances
…
Bob
Simple Queue Service(Amazon SQS)
consume Amazon SQS tasks
dispatch a task to Amazon SQS
containing the product id
process W*A*recommendations
Processing data with Spot Instances
Friday, November 15, 13
Simple Storage Service (Amazon S3)
EC2instance
m1.large
EC2instance
m1.large
EC2instance
m1.large
Spot instances
…
Bob
Simple Queue Service(Amazon SQS)
consume Amazon SQS tasks
dispatch a task to Amazon SQS
containing the product id
process W*A*recommendations
sync logs
sync logs
Processing data with Spot Instances
Friday, November 15, 13
Personalized e-mails
Abandoned cart Price dropped
Friday, November 15, 13
Personalized e-mailsUsers receive e-mails when:
Friday, November 15, 13
Personalized e-mails
• A product has a price drop
Users receive e-mails when:
Friday, November 15, 13
Personalized e-mails
• A product has a price drop• Abandoned a product on cart
Users receive e-mails when:
Friday, November 15, 13
Personalized e-mails
• A product has a price drop• Abandoned a product on cart• Visits many similar products
Users receive e-mails when:
Friday, November 15, 13
Personalized e-mails
Bob
Bob API
Friday, November 15, 13
Personalized e-mails
Bob
Bob API
notifies an
user interactionMailer
Manager
m1.largeSimple Queue Service
(Amazon SQS)
dispatch a task to Amazon SQS
containing the customer id
Bobby Mailer
Friday, November 15, 13
Personalized e-mails
EC2instance
m1.large
EC2instance
m1.large
EC2instance
m1.large
Spot instances
…
Bob
Bob API
notifies an
user interactionMailer
Manager
m1.largeSimple Queue Service
(Amazon SQS)
consume Amazon SQS tasks
dispatch a task to Amazon SQS
containing the customer id
find the best recommendationfor that user
Bobby Mailer
Friday, November 15, 13
Personalized e-mails
Simple EmailService (Amazon SES)
EC2instance
m1.large
EC2instance
m1.large
EC2instance
m1.large
Spot instances
…
Bob
Bob API
notifies an
user interactionMailer
Manager
m1.largeSimple Queue Service
(Amazon SQS)
consume Amazon SQS tasks
dispatch a task to Amazon SQS
containing the customer id
find the best recommendationfor that user
Bobby Mailer
send the e-mail
Friday, November 15, 13
Personalized e-mails
Simple Storage Service (Amazon S3)
Simple EmailService (Amazon SES)
EC2instance
m1.large
EC2instance
m1.large
EC2instance
m1.large
Spot instances
…
Bob
Bob API
notifies an
user interactionMailer
Manager
m1.largeSimple Queue Service
(Amazon SQS)
consume Amazon SQS tasks
dispatch a task to Amazon SQS
containing the customer id
find the best recommendationfor that user
Bobby Mailer
send the e-mail
sync logs
sync logs
Friday, November 15, 13
Analytics with Faunus
Graph Analytics Engine Distributed computing
Amazon EMR
Friday, November 15, 13
Analytics with Faunus
Graph Analytics Engine Distributed computing• Provides graphs input/output formats
Amazon EMR
Friday, November 15, 13
Analytics with Faunus
Graph Analytics Engine Distributed computing• Provides graphs input/output formats and traversal language for graphs
Amazon EMR
Friday, November 15, 13
Analytics with Faunus
Graph Analytics Engine Distributed computing• Distributed processing of large data sets across clusters• Provides graphs input/output formats
and traversal language for graphs
Amazon EMR
Friday, November 15, 13
Analytics with Faunus
Graph Analytics Engine Distributed computing• Distributed processing of large data sets across clusters• Designed to scale
• Provides graphs input/output formats and traversal language for graphs
Amazon EMR
Friday, November 15, 13
Analytics with Faunus
Graph Analytics Engine Distributed computing• Distributed processing of large data sets across clusters• Designed to scale• Detect and handle failures at application layer
• Provides graphs input/output formats and traversal language for graphs
Amazon EMR
Friday, November 15, 13
Analytics in Graphs with AWS
Friday, November 15, 13
Analytics in Graphs with AWS
> g.V.has(‘element_type’, ‘person’).age.mean()34.683232
Friday, November 15, 13
Analytics in Graphs with AWS
> g.V.has(‘element_type’, ‘person’).age.mean()34.683232
Friday, November 15, 13
Analytics in Graphs with AWS
> g.V.has(‘element_type’, ‘person’).age.mean()34.683232
Amazon EMR
Friday, November 15, 13
Backup process
nodetool script Amazon S3
Friday, November 15, 13
Backup process
nodetool script Amazon S3
Friday, November 15, 13
Backup process
nodetool script Amazon S3
Friday, November 15, 13
AmazonRoute 53
InternetGateway
ElasticLoad Balancing
EC2instance
EC2instance
m2.xlarge m2.xlargeAuto Scaling
API instances
Cassandra cluster
Backups
AmazonS3
Logs
AmazonS3
CACHE
AmazonElastiCache
Amazon EMR
m2.xlarge m2.xlarge
m2.xlarge m2.xlarge
Queue Queue Queue
Amazon SQS
EC2instance
m2.xlarge
EC2instance
m2.xlarge
Spot instances
Simple EmailService (Amazon SES)
Infrastructure
Auto Scaling
Friday, November 15, 13
Metrics
Friday, November 15, 13
Metrics
• 4.3 million Magazine Luiza identified customers
Friday, November 15, 13
Metrics
• 4.3 million Magazine Luiza identified customers• 50,000 nodes “products”
Friday, November 15, 13
Metrics
• 4.3 million Magazine Luiza identified customers• 50,000 nodes “products”• 90 million total nodes
Friday, November 15, 13
Metrics
• 4.3 million Magazine Luiza identified customers• 50,000 nodes “products”• 90 million total nodes• 350 million total edges
Friday, November 15, 13
Metrics
• 4.3 million Magazine Luiza identified customers• 50,000 nodes “products”• 90 million total nodes• 350 million total edges• 700 GB of data
Friday, November 15, 13
Metrics
• 4.3 million Magazine Luiza identified customers• 50,000 nodes “products”• 90 million total nodes• 350 million total edges• 700 GB of data• Peaks with 20,000 reads/sec - Cassandra Cluster
Friday, November 15, 13
Results matter…
10x faster 60%
Friday, November 15, 13
Results matter…
January 2013 March 2013 May 2013 July 2013 September 2013
Friday, November 15, 13
Results matter…
Solution A alone
January 2013 March 2013 May 2013 July 2013 September 2013
Friday, November 15, 13
Results matter…
First Bob testsSolution A alone
January 2013 March 2013 May 2013 July 2013 September 2013
Friday, November 15, 13
Results matter…
First Bob tests
Bob out for 2 weeks
Solution A alone
January 2013 March 2013 May 2013 July 2013 September 2013
Friday, November 15, 13
Results matter…
First Bob tests
Bob out for 2 weeks Bob alone
Solution A alone
January 2013 March 2013 May 2013 July 2013 September 2013
Friday, November 15, 13
Results matter…
First Bob tests
Bob alone
January 2013 March 2013 May 2013 July 2013 September 2013
Friday, November 15, 13
Results matter…
First Bob tests
Bob alone
190%
January 2013 March 2013 May 2013 July 2013 September 2013
Friday, November 15, 13
Next steps
Friday, November 15, 13
Next steps
• Use Faunus to pre-process all W*A* recommendations
Friday, November 15, 13
Next steps
• Use Faunus to pre-process all W*A* recommendations• Algorithms to identify communities in graph
Friday, November 15, 13
Next steps
• Use Faunus to pre-process all W*A* recommendations• Algorithms to identify communities in graph• Cassandra replication between regions
Friday, November 15, 13
Please give us your feedback on this presentation
As a thank you, we will select prize winners daily for completed surveys!
BDT303 Thank You
Friday, November 15, 13