July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

25
1 ©MapR Technologies - Confidential Using Standard File- Based Applications and SQL-Based Tools with Hadoop

description

MapR makes Hadoop a more open platform by supporting industry-standard interfaces, including NFS and ODBC. The NFS interface enables users to leverage standard file-based applications, and makes it easier to get data into and out of the cluster, while the ODBC interface enables users to leverage standard BI tools and query builders. This talk covers the motivation for supporting industry-standard interfaces as well as several real-world use cases. In addition, this talk explains the technical details behind these capabilities and how they actually work.

Transcript of July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

Page 1: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

1©MapR Technologies - Confidential

Using Standard File-Based Applications and SQL-Based

Tools with Hadoop

Page 2: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

2©MapR Technologies - Confidential

Tomer Shiran [email protected] Director of Product Management, MapR Technologies

http://info.mapr.com/HUG-7-2012

Page 3: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

3©MapR Technologies - Confidential

The MapR Distribution for Apache Hadoop

The open, enterprise-grade distribution for Apache Hadoop– Open source components• Hive, Pig, Cascading, HBase, ZooKeeper, Oozie, Flume, Sqoop, Whirr, …

– Enhancements to make Hadoop more open and enterprise-grade

Fastest growing distribution– Thousands of clusters deployed

Now available as a service with Amazon Elastic MapReduce (EMR)– http://aws.amazon.com/elasticmapreduce/mapr

Page 4: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

4©MapR Technologies - Confidential

MapR

Make Hadoop more open

Make Hadoop enterprise-grade

This presentation

Page 5: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

5©MapR Technologies - Confidential

Not All Applications Use the Hadoop APIs

Applications and libraries that use files and/or SQL

Applications and libraries that use the Hadoop APIs

30 years100,000s applications

10,000s libraries10s programming languages

Page 6: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

6©MapR Technologies - Confidential

Hadoop Needs Industry-Standard Interfaces

• MapReduce and HBase applications• Mostly custom-built

Hadoop API

• File-based applications• Supported by most operating systemsNFS

• SQL-based tools• Supported by most BI applications and

query buildersODBC

Page 7: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

7©MapR Technologies - Confidential

NFS

Page 8: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

8©MapR Technologies - Confidential

Your Data is Your Data

HDFS-based Hadoop distributions do not (cannot) support NFS

Your data is your data – make sure you can access it–Why store your data in a system which cannot be accessed

by 95% of the world’s applications and libraries?

Access to HDFS source code != access to your data

Page 9: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

9©MapR Technologies - Confidential

The NFS Protocol

RFC 1813

Very simple protocol

Random reads/writes– Read count bytes from

offset offset of file file– Write buffer data to

offset offset of a file file

HDFS does not support random writes so it cannot support NFS

WRITE3res NFSPROC3_WRITE(WRITE3args) = 7;

struct WRITE3args { nfs_fh3 file; offset3 offset; count3 count; stable_how stable; opaque data<>;};

READ3res NFSPROC3_READ(READ3args) = 6;

struct READ3args { nfs_fh3 file; offset3 offset; count3 count;};

Page 10: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

10©MapR Technologies - Confidential

Hadoop Was Designed to Support Multiple Storage Layers

HD

FSo.

a.h.

hdfs

.Dist

ribut

edFi

leSy

stem

NFS interface

Hadoop FileSystem API

S3o.

a.h.

fs.s

3nati

ve.N

ative

S3Fi

leSy

stem

Loca

l File

Sys

tem

o.a.

h.fs

.Loc

alFi

leSy

stem

FTP

o.a.

h.fs

.ftp.

FTPF

ileSy

stem

Map

R st

orag

e la

yer

com

.map

r.fs.

Map

RFile

Syst

em

o.a.h.fs.FileSystem InterfaceMapReduce

Page 11: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

11©MapR Technologies - Confidential

One NFS Gateway

Page 12: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

12©MapR Technologies - Confidential

Multiple NFS Gateways

Page 13: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

13©MapR Technologies - Confidential

Multiple NFS Gateways with Load Balancing

Page 14: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

14©MapR Technologies - Confidential

Multiple NFS Gateways with NFS HA (VIPs)

Page 15: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

15©MapR Technologies - Confidential

Customer Examples: Import/Export Data

Network security vendor– Network packet captures from switches are streamed into the cluster– New pattern definitions are loaded into online IPS via NFS

Online measurement company– Clickstreams from application servers are streamed into the cluster

SaaS company– Exporting a database to Hadoop over NFS

Ad exchange– Bids and transactions are streamed into the cluster

Page 16: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

16©MapR Technologies - Confidential

Customer Examples: Productivity and Operations

Retailer– Operational scripts are easier with NFS than DFS + MapReduce• chmod/chown, file system searches/greps, make, tab-complete

– Consolidate object store with analytics

Credit card company– User and project home directories on Linux gateways• Local files, scripts, source code, …• Administrators manage quotas, snapshots/backups, …

Large Internet company– Web server serve MapReduce results (item relationships) directly from cluster

Email marketing company– Object store with HBase and NFS

Page 17: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

17©MapR Technologies - Confidential

ODBC

Page 18: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

18©MapR Technologies - Confidential

ODBC

ODBC – Open DataBase Connectivity– Open standard API for accessing a SQL-based backend– Developed by Microsoft and Simba Technologies in 1992

Flagship API for SQL-based BI and reporting– Excel, Tableau, MicroStrategy, Crystal Reports, …

Advanced ODBC drivers use the latest 3.52 specification

Page 19: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

19©MapR Technologies - Confidential

MapR ODBC Driver

MapR provides a Hive ODBC 3.52 driver– Developed in partnership with ODBC inventor Simba Technologies– Compliant with latest ODBC 3.52 specification• 32- and 64-bit platform support• Windows and Linux

Enables direct SQL access to MapR-stored data by translating SQL to HiveQL

SQLizer enables seamless connectivity– Provides ANSI SQL-92 front-end– Targeted for existing apps that generate standard SQL queries– Transforms SQL query into HiveQL query

Page 20: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

20©MapR Technologies - Confidential

Example: Tableau

Page 21: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

21©MapR Technologies - Confidential

Example: Tableau

Page 22: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

22©MapR Technologies - Confidential

Example: Open source query builder (Kaimon)

Page 23: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

23©MapR Technologies - Confidential

Example: Microsoft Excel

Page 24: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

24©MapR Technologies - Confidential

Join MapR

Join the fastest growing Hadoop company

Open positions in every discipline– Engineers– Solution Architects– Product Management

Email [email protected]

Page 25: July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

25©MapR Technologies - Confidential

Time for Questions

Download slides or send me an email– http://info.mapr.com/HUG-7-2012

Download MapR to learn more– www.mapr.com/download