Big Data Analytics on the Google Cloud Platform
-
Upload
bigdatacloud -
Category
Technology
-
view
2.789 -
download
2
Transcript of Big Data Analytics on the Google Cloud Platform
Agenda1. 5 minutes
- Introductions2. 15 minutes
- Introduction to the Google Cloud Platform & its various Big Data services
3. 10 minutes - Showcasing various Online Retail Analytics - User, Site & Products Analytics
4. 15 minutes - Live Demonstration - Ingestion of session log data to visualization in Tableau
5. 15 minutes- Q&A Session(Can extend beyond based on the audience enthusiasm & participation!)
App Engine Big Query Cloud SQL Cloud Storage Compute Engine
Google Cloud Platform – Key Components
https://cloud.google.com
Tweet @ThirdEyeCss
A highly elastic and scale on demand infrastructure for deploying and running front end web applications
App Master
Front End Instance 1Front End Instance 2Front End Instance 3Front End Instance n
App Server Instance 1App Server Instance 2App Server Instance 3App Server Instance n
Datastore
Memcache
Static
Files
App Engine - Architecture
https://cloud.google.com/products/app-engine
Scales on Demand Very low barrier for entry No initial hardware costs Issues such as scalability, reliability are non-issues Can handle very large amounts of data Can handle very large user volumes, including
sudden spikes by scaling elastically
App Engine - Advantages
https://cloud.google.com/products/app-engine
A column oriented data store that can store and process billions of rows of data
SQL like query syntax for querying data
Run ad-hoc queries against multi terabyte data sets in seconds
Highly scalable, reliable and secure as it uses underlying core Google Platform Infrastructure
BigQuery
https://cloud.google.com/products/big-query
Supports all the main ETL and BI tools like Informatica, Talend, QlikView and Tableau
Primarily used for real-time data analysis and visualization
Integration with App Engine through APIs
BigQuery
https://cloud.google.com/products/big-query
SQL Access
Only SELECT operations
No CREATE, UPDATE or DROP
Analysis of Unstructured data using REGEXP_yyyy functions
JOINs of small (<8mb of compressed data) and large tables are possible. Performance penalty for large table joins
BigQuery
https://cloud.google.com/products/big-query
Programmatic Access bq command line tool, Google API client library,
REST API
Google API client library supports various languages like Java, Python, JavaScript, Ruby, PHP, Google Apps Script
Authentication is handled via Oauth2
In REST API, credentials and HTTP request have to be handled manually by user
BigQuery
https://cloud.google.com/products/big-query
Use Cases Can be used for batch analysis of large data sets
Real time analytics for dashboard type applications
Pre-process very large data sets and serve data in real-time
Visualization using third party tools that call Big Query APIs.
BigQuery
https://cloud.google.com/products/big-query
MySQL database running on the Google Cloud Platform Easy migration from local MySQL instances to Cloud SQL Highly scalable and reliable with replication Supports all major MySQL features including stored
procedures, triggers and views GUI Frontend for easy administration and operations Built on top of core Google Infrastructure Easy integration with App Engine
Cloud SQL
https://cloud.google.com/products/cloud-sql
A highly reliable cloud storage platform for storing and accessing vast amounts of data
Can be used for data archival and content delivery
Data can be ingested and processed by other Google Cloud Services
Accessible through GUI, command line and APIs
Cloud Storage
Cloud SQL
BigQuery
Cloud SQL
Custom App
Cloud Storage
https://cloud.google.com/products/cloud-storage
Object store that can deliver very efficiently over the internet Not a mountable file system Buckets are the basic container. They cannot be nested and can reside in
the US or EU geographies. Objects are stored in buckets. They are immutable and can be upto 5TB
in size. ACLs can be setup for Google users, groups, app domain, authenticated
users with READ, WRITE or FULL_CONTROL. Signed URL access for anonymous users.
Can be accessed using XML and JSON REST APIs Command line access using gsutil tool App Engine Storage API for access from App Engine
Cloud Storage
https://cloud.google.com/products/cloud-storage
Infrastructure as a service
Linux Virtual machines with associated storage and network infrastructure are hosted by Google
Can run any type of application or workload in the google cloud that uses the same Google Core Infrastructure
Highly elastic and scalable
A typical use case would be to provision a Hadoop Cluster on demand using several 10s to 100s of virtual machines as name node and data nodes
Compute Engine
https://cloud.google.com/products/compute-engine
Various machine type configurations possible such as High Memory, High CPU, Standard etc.
Very easy provisioning and management using cloud management software like RightScale
CentOS and Debian are the default OSes currently supported.
Typical use cases are batch processing, log analysis, i/o intensive workloads, hadoop on the cloud (map/reduce)
Compute Engine
https://cloud.google.com/products/compute-engine
Healthcare Store
Large online retailer’s Health Store website.
Thousands of health care products are sold per month.
These large online retailers are killing us!
I need to increase sales.I need to understand my site visitors better.Can Big Data Analytics help?
VP OF MARKETING
DATA SCIENTIST
Yes, Big Data Analytics can help!Google’s Cloud platform handles all the complexities of Big Data processing.We start with regular session log files.
Time & Date when
visitor came on site
Unique User & Session Id
Product Page Visited
by UserReferral Site
Session Log File (W3C compliant)
DATA SCIENTIST
From the simple log files, we can do sophisticated analytics like these:
User Analytics• # of Unique Site Visitors,
per hour, per day• # of Return Site Visitors,
per hour, per day• Total # of Site Visitors,
per hour, per day• Top 10 Active Users
per hour, per day
DATA SCIENTIST
Product Analytics like these:• Top 10 Popular
Products per hour, per day
• Top 10 popular Products in Shopping Basket per hour, per day
• Top 10 Bought Products
per hour, per day
DATA SCIENTIST
Conversion Analytics like these:• # of users who added products
to shopping basket per hour, per day
• # of users who actually bought products per hour, per day
• % of users who browsed, added products to shopping cart & actually bought per hour, per day.
DATA SCIENTIST
Google Cloud Platform’s BigQuery
Tables on
BigQuery with data from
Session Log
Files.
DATA SCIENTIST
Running a Query on BigQuery
Queries on
BigQuery are very
much SQL like,
easy to develo
p & gets
results fast.
DATA SCIENTIST
Visualize BigQuery’s Results in
Tableau provides an easy
& effective way to develop dash-
boards &
reports.
DATA SCIENTIST
Site Analytics – Referral Site Comparisons
Traffic referre
d to site from other
sources like
Google.com
DATA SCIENTIST
Site Analytics – Referral Site Comparisons
Traffic referre
d to site from other
sources like
Google.com
DATA SCIENTIST
Site Analytics – Referral Site Comparisons
Traffic referre
d to site from other
sources like
Google.com
DATA SCIENTIST
Product Analytics - Product Purchase Trends
Analysis of
specific product
s as purchase
d on site over
hours / days in a month
DATA SCIENTIST
Conversion Analytics - Product Added to Cart vs. Bought.
Analysis of which products were placed in cart
vsactually bought
over hours / days in a month
DATA SCIENTIST
Conversion Analytics - Conversion Rate Trends
Analysis of which products were placed in cart
vsactually bought
over hours / days in a month
DATA SCIENTIST
You now know: - how are your products selling, - when are they selling, - which referring site helps the most and other such info. You now have the power of Big Data Analytics on your fingertips!
Third Eye is Google’s Partner for the Google Cloud Platform
We are mentioned on Google’s Cloud Platform, site: https://cloud.google.com/partners/
Tweet @ThirdEyeCss
Contact:Dj Das, Founder & CEO, [email protected]
Alan Merrihew, VP of Business Development, [email protected]
Phone - (408) 462-5257
Corporate Site - ThirdEyeCSS.com
Big Data Training - ThirdEyeClasses.com
Big Data Educational Seminars - BigDataCloud.com, BigDataCloudToday.com, meetup.com/BigDataCloud
Big Data Jobs - jobs.BigDataCloud.com
Big Data Analytics As a Service - ClustersTogo.com, Power140.com, Raaser.com, PowerI90.com