Building a Recommendation Engine with Spring and Hadoop

Post on 08-Jul-2015

987 views 1 download

description

Speaker: Michael Minella Big Data Track The Amazon’s and Google’s of the world have had Ph.D.’s locked up in back rooms for years creating algorithms to get you to click on things and subsequently buy stuff. One of the big things that those smart people have been working on are recommendation engines. Today, a recommendation engine isn’t something that only the Amazon’s of the world can have. With an hour, and a handful of open source tools, we’ll build a recommendation engine based on the data from the website we probably spend the most time on…StackOverflow. We’ll use Spring XD and Spring Batch to orchestrate the full lifecycle of Hadoop processing (ingest, process, export) and use Apache Mahout to provide us with the recommendation processing. A basic understanding of Hadoop concepts (what Map/Reduce is) and Spring (basic D/I configuration) is expected for this talk.

Transcript of Building a Recommendation Engine with Spring and Hadoop

BUILDING

ENGINES

WITH SPRING

MICHAEL MINELLATWITTER: @MICHAELMINELLA

HOME PAGE: SPRING.IO/TEAM/MMINELLA

WHAT I’M NOT

https://github.com/SpringOne2GX-2014/

THANK YOUSEBASTIAN SCHELTERPAT FERREL

13

RECOMMENDATION

ALGORITHMS

L E T ’ S S E T S O M E

EXPECTATIONS

SCALE OF THE PROBLEM

MILLIONS OF

USERS

100,000’s OF

ITEMS

TOOLS AND

TECHNOLOGIES

1SPRING BOOT

2MYSQL

3HADOOP

4SPRING XD

5MAHOUT

SPRING XDEXTREME DATA

APPLICATIONCOMPLEXITY

L O T S O F

BOILERPLATE

MANY DOMAINS TO

BRIDGE

I N C O N S I S T E N T

APIS

SOURCE, CHANEL, SINK

DATA FLOW MODEL

ADAPTER, CHANEL, FILTER, TRANSFORMER, ETC

EIP PATTERNS

=

JOB, CONNECTOR

IMPORT/EXPORT

JOB, ITEMREADER/ITEMWRITER

BATCH PROCESSING

=

WORKFLOW, ACTION

WORKFLOWORCHESTRATION

JOB, STEP

BATCH PROCESSING

=

SPRING XDEXTREME DATA

SPRING

Ingestion

Orchestration

Extraction

Real-time

Analytics

D I S T R I B U T E D

RUNTIME

STREAMING

BATCH&

--directory=/xd/dir1

filter --expression=“payload?.price > 3.00” |

http | hdfs--port=8181

BATCH PROCESSING FOR

HEAVY LIFTING

JOB

STEP

TASKLET

CHUNK

SPRING FOR

APACHE HADOOP

TOTAL LINES OF CUSTOM CODE

47 Lines of Java

29 Lines of XML

6 Spring XD Shell Commands

RECOMMENDATION

ALGORITHMS

PREDICTING THE

FUTURE

C O L L A B O R AT I V E

FILTERING

TWO OPTIONS

USER BASED

USER ITEM 1ITEM 2ITEM 3ITEM 4ITEM 5

DEREK

MICHAEL

PHIL

DARREL ?

USER BASED

USER BASED

ITEM BASED

?

ITEM DEREKMICHAELPHILDARREL

ITEM 1

ITEM 2

ITEM 3

ITEM 4

ITEM 5

ITEM BASED

ITEM BASED

PEOPLE ARE

FUNNY

USER_ID, TAG_ID, VOTES

TAG_ID, TAG_ID, SCORE

LOOKING INTO THE

FUTURE

SNAPSHOTS AHEAD!

MAP REDUCE

M A P R E D U C E

PROBLEMS

A P I I S V E R Y

LOW LEVEL

H I G H

LATENCY

N O T A LWAY S A

GOOD FIT

POTENTIALLY

FASTER

HIGHER LEVEL

APIS

scala> textFile.count()

res0: Long = 126

USER_ID, TAG_ID, VOTES

TAGID,TAGID:RANK…

U S E A

SEARCH ENGINE1

D ATA

NORMALIZATION2

Learn More. Stay Connected.

Spring BatchProject: spring.io/spring-batchGithub: github.com/spring-projects/spring-batchJira: jira.spring.io/browse/BATCH

Spring BootProject: spring.io/spring-bootGithub: github.com/spring-projects/spring-boot

Spring XDProject: spring.io/spring-xdGithub: github.com/spring-projects/spring-xdJira: jira.spring.io/browse/XD

Twitter: twitter.com/springcentral

YouTube: spring.io/video

LinkedIn: spring.io/linkedin

Google Plus: spring.io/gplus

Servers by Jaime Carrion

from The Noun Project

Question by Jessica Lock

from The Noun Project

Check Box by Hrag Chanchanian

from The Noun Project

Crane by Kenneth Von Alt

from The Noun Project

Nut by Naomi Atkinson

from The Noun Project

Funnel by Volodin Anton

from The Noun Project

Circuit by Piotrek Chuchla

from The Noun Project

Puzzle by Matthew Hall

from The Noun Project

Database by Anton Outkine

from The Noun Project

Network by Mister Pixel

from The Noun Project

Puzzle by Eric M. Ellis

from The Noun Project

People by Wilson Joseph

from The Noun Project

Maze by Gilbert Bages

from The Noun Project

Fork by Dmitry Baranovskiy

from The Noun Project

Algebra by Ilsur Aptukov

from The Noun Project

Thumbs Up by Jørgen Bovolden

from The Noun Project

Scale by Edward Boatman

from The Noun Project

Users by Vittorio Maria Vecchi

from The Noun Project

Flow Chart by Michael Wohlwend

from The Noun Project

Running by Dimiter Petrov

from The Noun Project

Move by Dmitry Baranovskiy

from The Noun Project

Running by Dimiter Petrov

from The Noun Project

Abacus byAlice Mortaro

from The Noun Project

Stopwatch by Scott Lewis

from The Noun Project

Lego by jon trillana

from The Noun Project

Lego by jon trillana

from The Noun Project

Lego by jon trillana

from The Noun Project

Lego by Jake Dunham

from The Noun Project

TheEnd