Big Data Analytics from a Practitioners View
-
Upload
raghu-kashyap -
Category
Technology
-
view
900 -
download
2
Transcript of Big Data Analytics from a Practitioners View
Big Data Analytics from
a Practitioners viewSep 2013
Raghu Kashyap
About Raghu Kashyap
page 1
Areas of Responsibility
Data Insights Group (Site analytics,
Competitive Intelligence, Big Data)
Orbitz India, supporting Analytics
and BI teams
US, Europe, Australia(APAC)
Personal
Director – Data Insights Group
Strong background with technology(13
years) passion and experience with
analytics(4 years) and big data (3.5
year)
Masters in Computer Science
Golf, traveling, helping non-profit
organizations, spending time with my
wife and 2 boys
Twitter: @ragskashyap
Blog: http://kashyaps.com
Email: [email protected]
Orbitz Worldwide
page 2
Challenges
Lack of multi-dimensional capabilities
Heavy investment on the tools
Precision vs Accuracy
Data Governance
continued….
No data unification or uniform platform
across organizations and business
units
No easy data extraction capabilities
Hadoop history at OWW
page 5
Web Analytics & Big Data
OWW generates couple million air and hotel
searches every day.
Massive amounts of data. Over hundred GB
of log data per day.
Expensive and difficult to store and process
this data using existing data infrastructure.
Love Thy Hadoop
page 7
Long term storage for
very large data sets.
Open access to
developers and analysts.
Allows for ad-hoc
querying of data and
rapid deployment of
reporting applications.
Hadoop Growth
page 8
Hadoop Cluster
page 9
Treemap of HDFS storage
page 10
Approach with Hadoop and ETL
Raw logs
Flat files
Event Model
Map Reduce
ETL
External Tables
Data Warehouse (Greenplum)
GP Connector
Opportunities
page 12
Machine Learning
Site Analytics Data
PPC bidding efficiencies
Internal log analysis. Hgrep
MVT testing
Advanced Analytics
Show me the money
EFX – Every Friggin X
PPC bidding efficiencies
MAC vs. PC
Marketing Channel optimization
page 14
Orbitz.comDirect
Paid -Brand
Paid –Non
Brand
SEO –Brand
SEO -Non
BrandEmail
Meta
Travel Research
Affiliates
Display Ads
Hotel Rate Cache optimization
page 15
Data is collected as part of RCDC.
Includes every live rate search (aka
burst) performed by our hotel stack.
Raw data: ~200 GB, compressed, 108
records.
Extraction: <40 GB compressed, 109
records.
MVT
Analyze behavioral and Test data from our
MVT testing
page 16
DWH Log analysis
page 17
• Analysis of Greenplum DB logs within Hadoop
to analyze the data usage patterns.
• Impact analysis
• Hadoop usage for the last 30 days of DB log
analysis.
HIPPO is your best friend
• Expect organizational resistance from
unanticipated directions
• You can do wonders in the analytics area if
you get buy in.
Lessons Learnt
Analytics using Big Data comes with a price.
Data Governance
Senior Leadership buy in
I can't tell you the key to success, but the key
to failure is trying to please everyone." -Ed
Sheeranpage 19
How to capitalize on Big Data?
page 20
Learn from people who have already
done this.
DO NOT reinvent the wheel
Buy v/s Build balance
Build once and leverage mulitple
places.
Go where clients don’t want to go or
cant go in terms of execution.
What matters to Practitioners?
Things change dramatically in the
world of analytics
Being Agile is very important
Dashboards and Reports can take
you only to a certain level
Buy in from key groups is important
Grow business and impress Boss
page 21
2222222
Thank you