Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud...

29
Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Big Data Analytics Platform @ Nokia 1 1

Transcript of Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud...

Page 1: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

Selecting the Right Tool for the Right Workload

Yekesa Kosuru Nokia

Location & Commerce

Strata + Hadoop World NY - Oct 25, 2012

Big Data Analytics Platform @ Nokia

1 1

Page 2: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

•Big Data Analytics Platform @Nokia −Who we are −Use case data flows −Big data platform −Big data challenges

•Selecting the Right Tool for the Right Workload −Hadoop VS SQL −Which analytical database −Why InfiniDB

Agenda

2 2

Page 3: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

Nokia Internal Use Only

Great Mobile Products That Sense the World

WIN IN SMART DEVICES

CONNECT THE NEXT BILLION

INVEST IN FUTURE DISRUPTIONS

CREATE A LEADING “WHERE” PLATFORM

3 3

Page 4: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

Nokia Internal Use Only

Apps

Smart Data Platform

Content

Positions Maps Traffic Places Directions Guidance

One Platform, Enabling Contextually Rich Mobile Experiences

4 4

Page 5: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

Click to edit Master title style Big DATA ANALYTICS Platform @Nokia

5 5

Page 6: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

6

Business Challenges • Data silos, missing semantics

• Multiple sources - overlapping, conflicting

• Timely processing of large volumes of data

• Partial, insufficient, inaccurate, inconsistent.. data

• Security, privacy and other policies unknown

Central Analytics Platform created!

Page 7: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

7

Statistics • 10’s PB of data all across Nokia

• Multi-tenant, multi-petabyte analytics cluster

• 10-20K+ jobs per day

• 600+ internal users

• 250M+ KV queries

• Over a terabyte flowing every day

• Multiple data centers around the world

Page 8: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

Nokia Internal Use Only

Places Data Store (POI)- Use Case

Search

Platform

Data flow

Cloud Infrastructure

Account Management

Places Manager

Suppliers Places API Transaction

al Data HDFS Analytical DB

BI

Place CRUD

2

Supplier Uploads Data

3 Updated Blend

Record

6

Places Data Analytics

5

ETL and Blend places

4

Places Extract Portal

Delivered to OnlineSystems

7

Access Control

Authentication

User Logs In

1

Data Intake

Data Processing

FTP Oozie Sched

MR Blend

Hive Pig

MR SQL

Places Content

Analytics

K-V Store

8

Page 9: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

Nokia Internal Use Only

Reports

Analytical DB

Analytics Cluster

Big Data Analytics Platform Data Flows

Data Asset Catalog

Oracle

Dashboards

Data Discovery

InfiniDB

Interactive Queries

Batch Queries

Web Applications

Activity Logs

VShards (NoSQL)

Reference Data

Device Applications

Probes

3rd Party

Device

User Profile

POI, Map

Activity Sensor

Dat

a In

take

ETL,

Alg

orith

ms

Agg

rega

tion

HDFS

9

Page 10: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

Nokia Internal Use Only 10

Big Data Analytics Platform Data Flows

Analytical DB

Analytics Cluster

Data Asset Catalog

Oracle

Data Discovery

InfiniDB

Interactive Queries

Batch Queries

Dat

a In

take

ETL,

Alg

orith

ms

Agg

rega

tion

HDFS

Page 11: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

• Logical Tiers −Technology Platform −Data Platform −End User Layer (not shown)

Big Data Analytics Platform

ETL,

Alg

orith

ms

Agg

rega

tion

Data Asset Catalog

Data

Dat

a In

take

HDFS

Technology

Analytical DB

Page 12: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

Technology Platform

12

Hadoop R VShards (KV) Scribe, FTP Hive, Pig InfiniDB,

Oracle

Export/ Import

Workflow Engine

Config./ Deploy Monitor Alerts Archiver Scheduler

Security/Kerberos & ACL

Cloud Infrastructure

Page 13: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

Data Platform

13

Self Serve Tools

ETL, Agg Algorithms Data Quality Data Asset

Catalog

Data, Metadata, Operational Data

Workflow Orchestration

Technology Platform

Page 14: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

14

Data Platform – Analytics Lifecycle

Self Serve Tools

ETL, Agg Algorithms Data Quality Data Asset

Catalog

Data, Metadata, Operational Data

Collect Ingest Organize Analyze Deliver

Technology Platform

Page 15: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

Nokia Internal Use Only

Data Platform: Managing the Data Asset • Data Quality - garbage in , garbage out − Rules for validating, cleaning data, other heuristics − Trusting your insights − Process Quality − Light weight governance (semantics, integrity, privacy and

quality)

• Data Asset Catalog – describe your data − Capture essential metadata and logical domain models for

assets −physical model, logical model, policies, classifications −dependencies with other assets

− Serves as a entry-point to data browsing and asset discovery − Insulates subject matter experts from physical details of data

asset

Page 16: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

Nokia Internal Use Only

Big Data Challenges

• At every level - capture, curate, storage, process, visualize..

• Hadoop or SQL ? − Performance of analytical database ? − Batch or Interactive analysis − Neither SQL nor MR fits all problems

• Data & Metadata Fragmentation

Page 17: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

Click to edit Master title style Selecting the Right Tool for the Right Workload

17 17

Page 18: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

Nokia Internal Use Only 18

Hadoop VS SQL/Analytical DB

SQL/DW • Discover the question • Interactive/Fast • No coding • Standard industry tools • Mutable (Type 1 SCD) • Schema on Write • Analyst • Time to Wisdom

SQL/Analytical DB • Standard industry

tools • Interactive/Fast

(secs) • No coding, e.g. built-

in functions • Reasonable complex • Discover the

question

Hadoop/Hive/MR • ETL on steroids,

Scale • Batch/slow • Bunch of coding,

arbitrary complex • Harvest & load

into DW • Discover the

answer

Page 19: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

19

Why InfiniDB ?

• Works with BI tools (standard JDBC driver)

• Column oriented, MPP, clean architecture

• Horizontal and vertical partitioning, clever pruning

• Stream based MR like processing

• Efficient joins

• No indexes

• Impressive benchmarks

• Cloud deployment model

Page 20: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

Nokia Internal Use Only

InfiniDB vs Hive Performance

0

500

1000

1500

2000

2500

A B C D E F G

infiniDB (sec)

Hive (sec)

Query InfiniDB (sec) Hive (sec) A B 76.32 2155.92 C 25.59 1181.48 D 59.72 1497.22 E 1.8 446.5 F 12.38 1307.38 G 24.32 1886.81

Analytic Queries

Exe

cutio

n Ti

me(

secs

)

Page 21: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

Click to edit Master title style InfiniDB Under the Hood

21 21

Page 22: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

What is InfiniDB?

22

®

Scalable

Fast

Simple

Page 23: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

Analytics Data Platform Foundation

23

Analytics Data Platform

Columnar Performance Efficiency

MapReduce style Query Execution

Widely used MySQL Interface

®

Page 24: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

InfiniDB Building Blocks

24

Purpose built for big data analytics. •User Module (UM)

•Performance Module (PM)

or …

Single Server

Page 25: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

InfiniDB Building Blocks

25

Purpose built for big data analytics. •User Module (UM)

Understands SQL •Performance Module (PM)

Operates on data blocks

or …

Single Server

Page 26: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

Nokia Internal Use Only

InfiniDB M/R Style Distribution of Work “Map-Reduce Inside”

InfiniDB DoW Hadoop M/R Scalability Linear Linear

N-squared Problem Avoided Avoided

Latency Low Medium-High

Intermediate Results Handling

Stream-based File-based

Report Language SQL Erlang M/R, Hive, Pig

Tuning Automatic Manual

Real-Time Analytics Real-time access to granular data

Access to pre-defined aggregates

Ad-Hoc Full Ad-Hoc performance None

Data Storage Structured Unstructured

26

Page 27: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

Independent InfiniDB Benchmark

Q1 Series 2 table Joins

Q2 Series 3 table Joins

Q3 Series 4 table Joins

Q4 Series 5 table Joins

27

Page 28: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

28

Takeaways

• Hadoop is good but….

• Pay attention to data quality

• Hadoop or SQL

• Describe your data

Page 29: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS

THANK YOU Yekesa Kosuru Distinguished Architect, Nokia [email protected] www.nokia.com @Nokia Jim Tommaney CTO, Calpont [email protected] www.calpont.com @Calpont, @InfiniDB

Page 30: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS
Page 31: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS
Page 32: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS
Page 33: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS
Page 34: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS
Page 35: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS
Page 36: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS
Page 37: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS
Page 38: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS
Page 39: Big Data Analytics Platform @ Nokia - Hadoop Illuminated · Search Platform Data flow Cloud Infrastructure Account Management Places Manager . Suppliers . Places API Transaction HDFS