Recommendation Engine Acceleration Recommendation Engines · * Benchmarks obtained with Apache...

1
Acceleration Platform The Xelera Suite Software accelerates recommendation engines in order to provide OPEX savings when operating them in the public cloud or in on-premises data centers. It achieves this by offloading the machine learning inference to hardware accelerators such as FPGAs. The accelerator software can be used without application code changes because it integrates into the underlying software frameworks. In addition to the recommendation engine accelerator, Xelera provides optional software integration services (in the case of proprietary software APIs), optional machine model creation and maintenance and an optional integration in on-premises server infrastructure if required. Recommendation Engine Acceleration Recommendation Engines Typical Recommendation Engine Accelerated Recommendation Engine Use Case 1: Real-Time Advertisement Placement (OPEX savings) 20,000 user requests per second 1,000 parallel advertisement campaigns (Machine Learning models) 50 ms round-trip latency constraint Recommendation engine (web service) User information Placed website content Product/video recommendation, live advertisement, etc. Based on Machine Learning Up to 100,000 recommendations per second Challenge: Costly operation (# cloud servers) Web page Web service Machine Learning model (e.g. decision trees, deep learning, logistic regression, …) Prediction Ask prediction Web page Web service Ask prediction More recommendations per second Fewer servers Lower operational costs Prediction 25x acceleration ANALYTICS + # Servers required Est. cost saving Traditional 584 (c4.8xlarge) Xelera-accelerated 22 (f1.2xlarge) 25x Use Case 2: Real-Time Movie Recommendation (OPEX savings) 1,000 user requests per second 1,682 movies (Machine Learning models) 50 ms round-trip latency constraint (example) # Servers required Est. cost saving Traditional 36 (c4.8xlarge) Xelera-accelerated 1 (f1.2xlarge) 34x * Benchmarks obtained with Apache Spark framework; other recommender engine software may deviate from these results * Benchmarks obtained with Apache Spark framework; other recommender engine software may deviate from these results Machine Learning model (e.g. decision trees, deep learning, logistic regression, …)

Transcript of Recommendation Engine Acceleration Recommendation Engines · * Benchmarks obtained with Apache...

Page 1: Recommendation Engine Acceleration Recommendation Engines · * Benchmarks obtained with Apache Spark framework; other recommender engine software may deviate from these results *

Acceleration Platform

The Xelera Suite Software acceleratesrecommendation engines in order to provideOPEX savings when operating them in the publiccloud or in on-premises data centers. It achievesthis by offloading the machine learning inferenceto hardware accelerators such as FPGAs. Theaccelerator software can be used withoutapplication code changes because it integratesinto the underlying software frameworks. Inaddition to the recommendation engineaccelerator, Xelera provides optional softwareintegration services (in the case of proprietarysoftware APIs), optional machine model creationand maintenance and an optional integration inon-premises server infrastructure if required.

Recommendation Engine Acceleration Recommendation Engines

Typical Recommendation Engine Accelerated Recommendation Engine

Use Case 1: Real-Time Advertisement Placement (OPEX savings)• 20,000 user requests per second• 1,000 parallel advertisement campaigns

(Machine Learning models)• 50 ms round-trip latency constraint

Recommendation engine

(web service) User information

Placed website content

• Product/video recommendation, live advertisement, etc.

• Based on Machine Learning• Up to 100,000 recommendations per second• Challenge: Costly operation (# cloud servers)

Web pageWeb service

Machine Learning model (e.g. decision trees, deep learning, logistic regression, …)

Prediction

Ask prediction

Web pageWeb serviceAsk prediction

• More recommendations per second

• Fewer servers• Lower operational

costs

Prediction

25x acceleration

ANALYTICS+

# Servers required Est. cost saving

Traditional 584 (c4.8xlarge)

Xelera-accelerated 22 (f1.2xlarge) 25x

Use Case 2: Real-Time Movie Recommendation (OPEX savings)• 1,000 user requests per second• 1,682 movies (Machine Learning models)• 50 ms round-trip latency constraint

(example)

# Servers required Est. cost saving

Traditional 36 (c4.8xlarge)

Xelera-accelerated 1 (f1.2xlarge) 34x

* Benchmarks obtained with Apache Spark framework; other recommender engine software may deviate from these results

* Benchmarks obtained with Apache Spark framework; other recommender engine software may deviate from these results

Machine Learning model (e.g. decision trees, deep learning, logistic regression, …)