Building Big Data - cooladata.com · Software Costs Monthly 1 TB per Q 1TB per Month 3TB per Month...
Transcript of Building Big Data - cooladata.com · Software Costs Monthly 1 TB per Q 1TB per Month 3TB per Month...
Building Big Data
The True Cost of Building Analytics
Extreme scalability - scale up to tracking billions of events
Serving more than one analytical app
Real time - streaming (not Hadoop) in order to get RT capabilities
Permanent history - events are stored for several months
Direct access to individual events - able to access granular data
Support any analysis - both business and technical roles can
answer any question
A Big Data Platform – Basic Assumptions
Load Once Use Many
Big Data Platform
Campaign
managementA/B testing
Recommendation
engine
Dashboards
Push
notifications
Data mining
modules
Affiliate
reporting
ETL
REAL TIME
PROCESSINGHBASEChouchbase CASSANDRA
INTERACTIVE
PROCESSINGExasolVerticaRedshift
BATCH
PROCESSINGHADOOP HIVE
STRUCTURED AND UNSTRUCTURED DATA
(HDFS, S3)
REAL-TIME
PROCESSING
(KAFKA, STORM, KINSESIS)
DATA VISUALIZATION
(EXCEL, TABLEAU, QlikView)
Typical Big Data Architecture
REAL-TIME STREAMS
Big Data Platform Components
Admin
Track Collect Enrich AnalyzeStore Visualize
Audit and Control
Best of Breed Approach
Component Cloud Service/Open Source On Premises/Private Cloud
Collectors Cloudfront, Amazon Kinesis Storm, Kafka
Process Amazon EMR , Google data pipeline Hadoop distributions, Talend, Informatica
Storage Amazon S3, Google storage EMC, IBM, HP
Analytical DBGoogle Big Query, Amazon
Redshift, Impala, SparkVertica, Exasol, Infrobright
Real-time DB MongoIO, Redis Labs MongoDB, Couchbase, Cassandra
Visualization Analytics ChartIO, D3.JS, Google SpreadsheetLooker, Tableau,
QlikView, MicroStrategy
HR Costs
The most significant cost of building a Big Data analytics solution is human resources.
The solution is complex, requires real know-how and involves expertise.
We included
• ETL
• Cloud infra experts
• Java/Python developers
• DBA
• Dashboard developers
• Analysts
We didn’t include
• QA
• A 24/7 support team
HR Costs
1 TB per Q 1TB per Month3TB per
MonthMonthly Costs
BackEnd dev 1 1.5 2.5 $8,000
Infra/system MGMT 0.2 0.5 0.5 $10,000
DBA 0.3 0.5 1 $10,000
Analyst 1 2 2 $8,000
Total headcount 2.5 4.5 6
Total monthly headcount cost $21,000 $38,000 $51,000
Cloud Infra Costs
The infrastructure of an analytics solution consists of data storage, servers, network and monitoring
tools. All costs are proportional to the platform’s size.
We included a production environment for
• Servers
• Storage
• Network
We did not include
• Dev and test environments
All costs are based on cloud commodity hardware.
Keep in mind: appliances or special requirements such as large memory (RAM) or SSD are much more
expensive.
Cloud Infra Costs
1 TB per Q 1TB per Month 3TB per Month
Servers $1,500 $3,000 $12,000
Storage $100 $800 $2,000
Network $300 $1,000 $3,000
Total infra cost $1,900 $4,800 $17,000
Software Costs Monthly
1 TB per Q 1TB per Month 3TB per Month
ETL/Hadoop $100 $500 $1,000
Analytical DB $500 $1,000 $5,000
Visualization tool $1,000 $1,000 $2,000 (10-25 Users)
Total software costs $1,600 $2,500 $8,000
We converted perpetual and maintenance costs in to ongoing monthly cost.
DB & ETL costs are based on cloud services.
The cost of visualization tools is base on market leaders.
Overall Costs
1 TB per Q 1TB per Month 3TB per Month
Infrastructure $1,900 $4,800 $17,000
Software $1,600 $2,500 $8,000
Human resources $21,000 $38,000 $51,000
Total monthly $24,500 $45,300 $76,000
Total annual $294,000 $543,600 $912,000
When Should You Build?
• Analytics is your core business
• Your analytics is highly tied into your operational system
• Your analytical requirements are special
• You have sufficient time and resources
Big Data is not a project – it’s an ongoing process!