Gluster fs architecture_&_roadmap_atin_punemeetup_2015

37
Atin Mukherjee GlusterFS Hacker GlusterFS – Architecture & Roadmap GlusterFS Meetup Feb 2015

Transcript of Gluster fs architecture_&_roadmap_atin_punemeetup_2015

Atin Mukherjee

GlusterFS Hacker

GlusterFS – Architecture & Roadmap

GlusterFS MeetupFeb 2015

07/02/2015 GlusterFS Meetup

Agenda

● Introduction in the Gluster community

● Current stable releases

● What is GlusterFS?

● Architecture

● GlusterFS 3.6 Features

● GlusterFS 3.7 Features planned

● GlusterFS 4.0 and beyond

● Q&A

07/02/2015 GlusterFS Meetup

Introduction in Gluster community

● Different roles

● Users, testers, supporters, developers, editors, ...

● Different organizations

● Products based on / containing GlusterFS

● Service, consulting and support

● Integration in other (Open Source) projects

07/02/2015 GlusterFS Meetup

Introduction in Gluster community

● Regular IRC meetings

● Discussions and support over mailinglists and on IRC

● Providing packages (RPMs, DEBs)

● Work with different Linux and BSD distributions to improve portability and availability

● Infrastructure hosting for Gluster related projects

● Gerrit and Jenkins for code review and testing

● Gluster Forge for git/wiki hosting of projects

07/02/2015 GlusterFS Meetup

Introduction in Gluster community

● Some numbers from 2014

● Approx. 175 IRC participants

● Two main mailinglists reach ~600 emails/month

● 100/60 active users/devs posting to the lists

● Around 2200 patches merged in the master branch

● Patches of ~90 developers got included

07/02/2015 GlusterFS Meetup

Current stable releases

● Maintenance of three minor releases

● 3.6, 3.5 and 3.4

● Bugfixes only, non-intrusive features on high demand

● Patches get backported to fix reported bugs

07/02/2015 GlusterFS Meetup

Integration with glusterfs

● More projects built and enhanced around the GlusterFS ecosystem – dockit, gluster-deploy, gluster-nagios, glusterfsiostat, puppet-gluster to name a few.

● Improved integration with broader ecosystem projects like Ambari, NFS-Ganesha, OpenStack, oVirt and Samba.

07/02/2015 GlusterFS Meetup

What is GlusterFS?

● A general purpose scale-out distributed file system.

● Aggregates storage exports over network interconnect to

provide a single unified namespace.

● Filesystem is stackable and completely in userspace.

● Layered on disk file systems that support extended

attributes.

07/02/2015 GlusterFS Meetup

Typical GlusterFS Deployment

Global namespace

Scale-out storage

building blocks

Supports

thousands of clients

Access using

GlusterFS native,

NFS, SMB and HTTP

protocols

Linear performance

scaling

07/02/2015 GlusterFS Meetup

GlusterFS Architecture – Foundations

● Software only, runs on commodity hardware

● No external metadata servers

● Scale-out with Elasticity

● Extensible and modular

● Deployment agnostic

● Unified access

● Largely POSIX compliant

07/02/2015 GlusterFS Meetup

Concepts & Algorithms

07/02/2015 GlusterFS Meetup

GlusterFS concepts – Trusted Storage Pool

● Trusted Storage Pool (cluster) is a collection of storage servers.

● Trusted Storage Pool is formed by invitation – “probe” a new

member from the cluster and not vice versa.

● Logical partition for all data and management operations.

● Membership information used for determining quorum.

● Members can be dynamically added and removed from the

pool.

07/02/2015 GlusterFS Meetup

GlusterFS concepts – Trusted Storage Pool

Node2

Probe

Probe accepted

Node 1 and Node 2 are peers in a trusted storage pool

Node2Node1

Node1

07/02/2015 GlusterFS Meetup

GlusterFS concepts – Trusted Storage Pool

Node1 Node2 Node3Node2Node1 Trusted Storage Pool

Node3Node2Node1

Detach

07/02/2015 GlusterFS Meetup

A brick is the combination of a node and an export directory – for e.g. hostname:/dir

Each brick inherits limits of the underlying filesystem

No limit on the number bricks per node

Ideally, each brick in a cluster should be of the same size

/export3 /export3 /export3

Storage Node

/export1

Storage Node

/export2

/export1

/export2

/export4

/export5

Storage Node

/export1

/export2

3 bricks 5 bricks 3 bricks

GlusterFS concepts - Bricks

07/02/2015 GlusterFS Meetup

GlusterFS concepts - Volumes

● A volume is a logical collection of bricks.

● Volume is identified by an administrator provided name.

● Volume is a mountable entity and the volume name is

provided at the time of mounting.

– mount -t glusterfs server1:/<volname> /my/mnt/point

● Bricks from the same node can be part of different

volumes

07/02/2015 GlusterFS Meetup

GlusterFS concepts - Volumes

Node2Node1 Node3

/export/brick1

/export/brick2

/export/brick1

/export/brick2

/export/brick1

/export/brick2

music

Videos

07/02/2015 GlusterFS Meetup

Volume Types

➢Type of a volume is specified at the time of volume

creation

➢ Volume type determines how and where data is placed

➢ Following volume types are supported in glusterfs:

a) Distribute

b) Stripe

c) Replication

d) Distributed Replicate

e) Striped Replicate➢ f) Distributed Striped Replicate➢ g) Dispersed➢ h) Distributed dispersed

07/02/2015 GlusterFS Meetup

Distributed Replicated Volume

07/02/2015 GlusterFS Meetup

GlusterFS 3.6

● Better SSL support

● Heterogenous bricks

● Erasure coding

● Meta translator

● Volume snapshots and user-servicability

07/02/2015 GlusterFS Meetup

GlusterFS 3.6 contd.

● AFRv2

● RDMA transport for GlusterFS volumes

07/02/2015 GlusterFS Meetup

Better SSL support

● SSL support for management plane

● SSL for authorizing and authenticating access to volumes.

● Paves way for fine-grained access to volumes in the storage pool*.

● Makes self-service style management at a volume-level possible*.

* - Not implemented yet; technically possible

07/02/2015 GlusterFS Meetup

Heterogenous bricks

● Allows distribution of data to account for bricks of different sizes

● Uniform distribution can potentially penalise smaller bricks with more allocations

● Changes were made to the DHT (distribute) translator

07/02/2015 GlusterFS Meetup

Erasure Coding

● Provides resilience to brick failures using erasure codes

● Configurable redundancy and fault tolerance

● Reduces disk space consumption in comparison to replicated volumes

07/02/2015 GlusterFS Meetup

Meta xlator

● Provides a /proc like interface to GlusterFS runtime

● Allows users to inspect internals of translators present in GlusterFS runtime 'stack'.

● For e.g, cat /mnt/glusterfs/.meta/version

to fetch the version of glusterfs mount process

● tree /mnt/glusterfs/.meta/graphs/active

07/02/2015 GlusterFS Meetup

AFRv2

● Refactored AFR implementation

● Improvements in healing process' performance

● Paves way for better introspection into thehealing process. More on 3.7

07/02/2015 GlusterFS Meetup

RDMA Support

● Minor fixes that made RDMA transport more

usable for GlusterFS volumes

07/02/2015 GlusterFS Meetup

GlusterFS 3.7

● Small file performance

● Data classification

● Bit­rot detection

● Better OpenStack integration – for e.g Manila

07/02/2015 GlusterFS Meetup

Small file performance

● Multi­threaded epoll – Transport layer

● Caching stat and xattr calls on small files – Storage layer

● Migrate .glusterfs to SSDs – Physical layer

● Batching of RPCs per file access

07/02/2015 GlusterFS Meetup

Data Classification

● Mapping file characteristics to subvolume characteristics

● File characteristics – size, age, access rate, type (extension)

● Subvolume characteristics – physical location, storage type (SSD, disk), encoding method (deduplicated, erasure coded)

● User provided mappings via 'tags'

● Implementation using 'DHT over DHT' pattern

07/02/2015 GlusterFS Meetup

Bit­Rot detection

● Silent disk corruption

● Useful for archival or WORM workloads

● Lazy, policy­based and incremental checksum computation

07/02/2015 GlusterFS Meetup

Better Openstack integration

● Manila – File share as a service

● Cinder – Block storage as a service

● Swift – Object storage as a service

● Sahara – Hadoop as a service

● For Kilo release

07/02/2015 GlusterFS Meetup

GlusterFS 4.0 Vision

● To be the best in class distributed commodity storage with unified access of data

07/02/2015 GlusterFS Meetup

GlusterFS 4.0 Vision

● Community scaling – design by community

● Node scaling

● Technology scaling

● Development process scaling

07/02/2015 GlusterFS Meetup

GlusterFS 4.0 Vision

● 'Thousand node glusterd'

● DHT scalability

● NSR – Log based, chain replication

● Better brick management

● Split Network

● ... and many more. See http://www.gluster.org/community/documentation/index.php/Planning40

07/02/2015 GlusterFS Meetup

Q&A

07/02/2015 GlusterFS Meetup

Resources

Mailing lists:[email protected]@gluster.org

IRC:#gluster and #gluster-dev on freenode

Links:http://www.gluster.orghttp://hekafs.orghttp://forge.gluster.orghttp://www.gluster.org/community/documentation/index.php/Arch