Hadoop file systems

19
@wattstev e Analytics on your data in place Steve Watt, Red Hat CC flickr Barta IV

description

 

Transcript of Hadoop file systems

Page 1: Hadoop file systems

@wattsteve

Analytics on your data in placeSteve Watt, Red Hat

CC flickr Barta IV

Page 2: Hadoop file systems

@wattsteve

Hadoop at Red Hat

Page 3: Hadoop file systems

@wattsteve

But tonight I have my community hat on

CC flickr wcdumonts

Page 4: Hadoop file systems

@wattsteve

Platform Layers Technologies

Computational Runtimes

MapReduce, HBase

FileSystems HDFS or Amazon S3

Infrastructures x86 or Amazon EC2

Hadoop in 2007

CC flickr wwarby

Page 5: Hadoop file systems

@wattsteve

Hadoop in 2013

CC flickr lowfatbrains

Platform Layers Technologies

Computational Runtimes

YARN, GiRAPH, MapReduce, HBase, Phoenix, Spark/BDAS, Drill, Impala, Stinger

FileSystems HDFS + 13 Other Hadoop FileSystems

Infrastructures System on a Chip, x86, Virtualization and Cloud

Page 6: Hadoop file systems

@wattsteveCC flickr grufnik

Observation #1: The Hadoop FileSystem Interface is the keystone of the entire Ecosystem

Page 7: Hadoop file systems

@wattsteve

.

CC flickr traftery

Observation #2: Moving data around just to analyze it is slow and expensive. Especially if it requires a redundant repository

Page 8: Hadoop file systems

@wattsteve

Hadoop FileSystem Interface

HBaseMapReduce YARNHadoop FS Clients

Hadoop FileSystem

So how does this work?

By leveraging Hadoop’s pluggable FileSystem architecture

Any Application

FileSystem Implementation

Hadoop FileSystem Plugin

Page 9: Hadoop file systems

@wattsteve

Hadoop FileSystem Interface

HDFS

HDFS Plugin

HBaseMapReduce YARNHadoop FS Clients

Hadoop FileSystem

Hadoop FileSystem Configuration for HDFS

Any Application

Page 10: Hadoop file systems

@wattsteve

What are some examples of where big data is stored? - Object Stores

- NoSQL Stores

- Distributed FileSystems

- Network Filers

- Databases

CC flickr birdwatcher63

Page 11: Hadoop file systems

@wattsteve

Hadoop FileSystem Interface

GlusterFS Plugin

HBaseMapReduce YARNHadoop FS Clients

Hadoop FileSystem

Network Filer Example

Hadoop FileSystem Configuration for GlusterFS

Any Application

Page 12: Hadoop file systems

@wattsteve

GlusterFS

Network Filer - Apache Hadoop on GlusterFS

Resource

ManagerManagement

Server

Trusted Peer

DAS Brick

Node

Manager

Node

Manager

DAS Brick

Trusted PeerTrusted Peer

DAS Brick

Node

Manager

Server 1 Server 2 Server 50

. . .

FUSE

Hadoop

Workers

Hadoop

Master Services

NFS

SWIFT

FUSE FUSE FUSE

plugin plugin plugin

plugin

Page 13: Hadoop file systems

@wattsteve

Hadoop FileSystem Interface

SWIFT Plugin

HBaseMapReduce YARNHadoop FS Clients

Hadoop FileSystem

Object Store Example

Hadoop FileSystem Configuration for SWIFT

Any Application

SWIFT

Page 14: Hadoop file systems

@wattsteve

Hadoop FileSystem Interface

CassandraFS Plugin

HBaseMapReduce YARNHadoop FS Clients

Hadoop FileSystem

NoSQL Example

Hadoop FileSystem Configuration for CassandraFS

Any Application

Page 15: Hadoop file systems

@wattsteve

NoSQL - Apache Hadoop on CassandraFS

Page 16: Hadoop file systems

@wattsteveCC flickr syume

We are working on filesystem tests within Apache Hadoop-Common and Apache BigTop as well as opening up ecosystem tools

Page 17: Hadoop file systems

@wattsteve

Page 18: Hadoop file systems

@wattsteve

Page 19: Hadoop file systems

@wattsteve

Closing Remarks

1. The amount of Hadoop FileSystems available to you continues to increase

2. This is good! A vibrant ecosystem gives you choice

3. Evaluate the option of analyzing your data in place before deploying new environments

CC flickr zoomboy1