Self-Service Analytics on Hadoop: Lessons Learned
-
Upload
dataworks-summithadoop-summit -
Category
Technology
-
view
802 -
download
3
Transcript of Self-Service Analytics on Hadoop: Lessons Learned
Self-Service Analytics on Hadoop: Lessons Learned
June 29, 2016Drew LeamonDirector – Advanced Technology Solutions
Comcast: Shaping the Future of Media and Technology
High Speed Internet
Forecast
Engineering Design
Budget
Engineering Analysis: Global Central Analysis Team
Animals are Best Suited in Their Native Habitat
Spreadsheets: The Natural Habitat of Analysts
Evolution of Self Service Analytics
SSRS
Self Service: Native HabitatLimitations of the Spreadsheet Native Habitat
• 1 Million Row Max
Self Service• Not Even Medium Data• Not Collaborative• No Automation• Not Repeatable
IT Analyst
Self Service: How We StartedAnalyst goes to IT, makes request, waited weeks to get results
SSRS
• 10 TB Storage • 1 Compute Node
Not Self Service• 10 TB (Medium Data)• Limited Compute• IT Hand-off• Consultative service• Not self service.
IT Analysts
Bigger database still meant building dashboards for team
IT Analysts
Still Not Self Service• 100s TBs (Large Data)• Data silos• IT Hand-off• Consultative service• Analysts not SQL
experts
Graduated to Specialized Databases
• Clustered Storage• Columnar Compression• Clustered Compute
Datameer, native on Hadoop, enables self-service for big data
Analysts
True Self Service• PB == Big Data• Data Lake • Excel-like UI• No more waiting for IT
Self Service: The New Way
• Clustered Storage• Columnar Compression• Clustered Compute• Liberated Data
11
Multiple Configurations for Big Data
12
Engineering Analysis
IP Telephony
Video Research
IP Video Engineering
X1 Operations
Advanced Advertising
Web Analytics
Enterprise Business
Intelligence
Network EngineeringMature
Evolving
On-Boarded
On-Deck
Expanding Use Cases with Datameer
Use Case #1: Comcast Digital Voice
One Of The Largest IP Telephony Networks
Anonymized Call Detail Records (CDR) Data Set
Data complexity from networkData size: TBs/month
Discovered Unusual PatternsNoticed large spikes for high cost areas
Hypothesis: Network Abuse
30% of this traffic was coming from three accounts.
Analysis Shows Traffic Concentration Few Accounts
Ongoing Monitoring of Future Abuse
Analyst Scheduled a Tableau Data Extract and built a Tableau dashboard- Now the business can keep an eye out for further abuse.
Result: Future Abuse Prevented and More
Abuse detected Analysts empowered Resources saved
No IT hand-off Value to organizationAutomated and repeatable
21
Engineering Analysis
IP Telephony
Video Research
IP Video Engineering
X1 Operations
Advanced Advertising
Web Analytics
Enterprise Business
Intelligence
Network EngineeringMature
Evolving
On-Boarded
On-Deck
Expanding Use Cases with Datameer
22
Use Case #2: Customer PerspectiveHow to measure customer experience from the customer perspective
23
Millions of Viewing Experiences
24
Improved Customer Experience through Data Analytics
Findings / Analysis
Best Practices
Improved Customer Experience
Data driven schedulingDataflow Automation
Solution:
25
- Build views quickly & aggregate large datasets.
- Early visibility of data in Hadoop
Analyze
- Create repeatable processes through automated workflow
• Aggregations of large datasets from disparate data sources.- RDBMS, HDFS, APIs
• Data Joins / Data Quality Checks / Pipeline between clusters
Blend
Share
Insights
26
Result: Data-driven Customer Viewing Experience Enhancements
Customer Experience Improved
Analysts empowered Capital Spend Directed Intelligently
No IT hand-off Value to organizationAutomated and repeatable