Data Science Stack with MongoDB and RStudio

Building up an easy data science platform with RStudio server on top of your MongoDB

Winston Chen – Lead Software Engineer

What does Fliptop do?

• Predictive Lead Scoring, using data science– Pull opportunity/lead/contact data from CRM– Aggregate company data and social data from various

data sources and the internet– Over 3000 signals– Build conversion/revenue model– Predict lead conversion and revenue

Our Platform Stack

• Java/Scala• Liftweb• JMS/Storm• MongoDB/MySql

Our Machine Learning Stack

• Python• Numpy/Scipy/Pandas• Bottle (RESTful Server)

So, where is R then?

• Problem:– Data is stored in MongoDB

• Sales Lead Data• Sales Opportunity Data• Sales Contact Data

– It’s hard to view/digest/process data on the fly using MongoDB console• (X) Text processing for insight extraction?• (X) Prototype cool machine learning algorithms on the fly?

• Solution:– R and Rstudio Server

• Why not scala?• Why not python/ipython

MongoDB Console & Query

Rstudio Server

Pull MongoDB data into R data frame

• rmongodb (https://github.com/gerald-lindsly/rmongodb)

Transform Into a R data-frame

1 – Get the total count of your data set

2 – Construct Vectors for each column

3 – Loop through curser and insert values

Where are my apply functions?- Too bad. We are using mongo cursor :P

4 – Go into sub bson block to extract data (optional)

5 – Construct data frame and return

You are able to get the full example code here: http://goo.gl/tlyyXp

We now have a data frame to play with from MongoDB bson.

This is NOT a BIG DATA Stack

• It takes around 1 min to process 900Mb+ of bson from Mongo.

• NOT BIG data stack – Data should fit into the ram• Most of the data in the business world is not big

anyways.• It works fine for us (m1.large machine in AWS)

– CRM data is never big, not even after we pull in 3000+ additional signals.

– The term ‘Big-Data’ is seriously overrated, ‘Data Science’ however, is the key term here.

@Fliptop, we now use Rstudio to do

• Data Insight Extraction• Algorithm prototyping

If you REALLY want BIG Data

• Look into: HDFS + Pig/Hive + Hue(any other suggestion from the audience here?)

• Winston Chen– Personal Blog: http://winston.attlin.com/– Twitter: @wingchen83– winston@fliptop.com

• Fliptop is hiring Data Scientists. Please email to:winston@fliptop.com

Data Science Stack with MongoDB and RStudio

Engineering

Transcript of Data Science Stack with MongoDB and RStudio

MongoDB Days UK: Building Apps with the MEAN Stack

Lecture notes for Statistical Computing 1 (SC1) …//cran.r-project.org/web/views/ReproducibleResearch.html Rstudio Rstudio Setup Install LATEX, R, and Rstudio on your computer, as

MongoDB: A New Genie in the LAMP (Stack)€¦ · getting started. $ mysqld $ mongod $ mysql $ mongo $ mongo MongoDB shell version 1.6 ... > db test > $ mongo MongoDB shell version

Project Go Virtual...• Loan-management-system stack: php-7.2, Symfony 3.4, PostgreSQL, mongoDB, RabbitMQ, ElasticSearch, React 15.6 (edited) • Website stack: October CMS (edited);

RStudio IDE : : CHEAT SHEET

Work with strings with stringr : : CHEAT SHEETedrub.in/CheatSheets/cheatSheetStringr.pdf · RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212

Flask Full Stack - Desenvolvendo um CMS com Flask e MongoDB

Full-stack Web Development with MongoDB, Node.js and AWS

Intro to RStudio

KVANTITATIVNE METODE ZA RSTUDIO

MongoDB Days Silicon Valley: Building Applications with the MEAN Stack

Data Science Stack with MongoDB and RStudio

Package 'biotools' - RStudio

Introductie R en RStudio - GitHub Pages · Introductie R en RStudio Ivy Jansen, Pieter Verschelde, Thierry Onkelinx R en Rstudio • R – Taalwaarmeejegegevens(statistisch)kanverwerken

OVERVIEW: FULL STACK FLEX PROGRAM · Server Side Development • Node.js • Express • User Authentication • Progressive Web Applications (PWAs) • MERN Stack (MongoDB, Express.js,

Advanced Troubleshooting Techniques for your Application Stack Using MongoDB

MongoDB and the MEAN Stack

CCSA Checkpoint MEAN Stack - Sevenmentor Pvt. Ltd · Getting Started with MEAN Stack Course Course Introduction What is MongoDB, Angular, Nodejs and Express.js MongoDB, Angular, Nodejs

Rstudio ide-cheatsheet

Data Visualization - RStudio · PDF fileData Visualization with ggplot2 Cheat Sheet RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@ • 844-448-1212 • Learn