Phases of Big Data Challenges @ Nokia
-
Upload
innovation-enterprise -
Category
Technology
-
view
97 -
download
3
description
Transcript of Phases of Big Data Challenges @ Nokia
Yekesa KosuruHERE.com
Nokia
Hadoop Innovation Summit February 20 & 21, San Diego 2013
Phases of Big Data Challenges@ Nokia
11
• Phases of Big Data Challenges @Nokia
– Who we are
– Big data platform
– Use case data flows
– High level architecture
–Challenges• Phases of challenges
Agenda
22
Accelerometer
GPS
Water
Proof
12h
Battery
Bluetooth 2GB Storage
Barometer
NFC
Gyroscope
Magnetometer
Who we are – disrupting the future
3
Apps
Smart Data
Platform
Content
PositionsMaps TrafficPlaces Directions Guidance
Location Platform, Enabling Contextually Rich Mobile Experiences
44
5
Big DataAnalytics
…to Be MadeAvailable for Analysis
Enabling feedback loops for continuous improvement,Location Optimized Experience, CRM, etc..!
Big Data Flows and Differentiates
…on All SupportedPlatforms…
NokiaAccount
We CollectUser Data…
5
Click to edit Master title style
Phase 0
66
2008 – ‘10Build Technology
Platform,Get Data
7
Business Challenges
• Data silos, no unique identifiers, missing semantics
• Multiple sources - overlapping, conflicting
• Timely processing of large volumes & velocity of data
• Partial, insufficient, inaccurate, inconsistent.. data
• Data/wire formats, Security, privacy and other policies unknown
Central Big Data Platform created
8
…to verify Map accuracy and create Motion Graph
Using different big data sets
Reports
AnalyticalDBMS
Analytics Cluster
Data AssetCatalog
AnalyticalDBMS
Dashboards
Data Discovery
InteractiveQueries
BatchQueries
Web Applications
Activity Logs
VShards(NoSQL)
Reference Data
Device Applications
Probes
3rd Party
Device
User Profile
POI, Map
ActivitySensor
Dat
a In
take
ETL,
dat
a cr
un
chin
g,
attr
ibu
tio
n, M
L A
lgo
rith
ms
Agg
rega
tio
n
HDFS
9
AnalyticalDBMS
Big Data Analytics Platform Data Flows
Technology Platform
10
Hadoop RVShards
(KV)SDK,
Scribe, FTPHive, Pig
AnalyticalDBMS
Export/Import
Workflow Engine
Config./Deploy
Monitor AlertsData
PipelineScheduler
Security/Kerberos & ACL
On-Premise & Cloud Infrastructure
11
Data Platform
Self ServeTools
ETL, AggMachine Learning
Data QualityData Asset
Catalog
Data, Metadata, Operational Data
Collect Ingest Organize Analyze Deliver
Technology Platform
Click to edit Master title style
Phase 1 – 2012
1212
2008 – ‘10Build Technology
Platform,Get Data
2011Enhance Platform,
More Data,Simple Analytics,Data Crunching
2012PB’s of Data,
Hundreds of UsersThousands of JobsComplex Analytics,Multiple Clusters
13
2012 Production Statistics
• 10’s PB of data all across Nokia
• Multi-tenant, multi-petabyte analytics cluster
• 10-20K+ jobs per day
• 600+ internal users
• 300M+ KV queries
• Terabytes flowing in every day
• Multiple data centers around the world
14
Challenges With Big Data• Complex eco-system of technologies - many moving
parts, slower deploy cycles, data integration is complex
• Capacity & Scale Issues – Provision for peaks or sustained, storage or compute ?
• DBMS great for performance & data management, but cant scale - price/performance & ACIDity
• Hadoop great for ETL, but poor on query performance & data management, not interactive
• Data and Metadata fragmentation
15
Big Data Capacity Issues
• Spikey Workloads
• Capacity Provisioning– Peaks
– Sustained loads
• How many clusters ? – SLA/Adhoc/Research
– Multiple data centers
– Data duplication
• Tenancy – single/multi
• TOC – Hadoop can get expensive -
storage & computed tightly coupled, idle machines
16
Cloud helps with some issues• Operational & IT complexity reduced – API based spin up
& tear down – rapid deployments, faster cycles
• Pay for what is used
• Capacity issues mitigated - idle machines or peaks not an issue – elastically scale up and down
• De-coupled Storage and Compute makes sense
• Stateless architecture, recycle slow/bad machines, no need for rolling upgrades, instead do rolling replace
Click to edit Master title style
Phase 2
1717
2012PB’s of Data,
Hundreds of UsersThousands of JobsSimple & Complex
Analytics
2008 – ‘10Build Technology
Platform,Get Data
17
2011Enhance Platform,
More Data,Simple Analytics
2013Still Pending Challenges
18
Still Pending
• Data and Metadata fragmentation, need deeper integration into all tools/frameworks
• Advanced Analytics - Data science problems are hard & inefficient to implement in Map Reduce/RDBMS
19
Complex Analytics
• Mathematicians think terms of Arrays not Map Reduce
• Data science tools can’t efficiently handle big data
• Data partitioning is naïve, indexing wont scale
Big Data Technologies for Future