Design of an Exact Data Deduplication...

Design of an Exact Data Deduplication Cluster

Jürgen Kaiser (JGU) Dirk Meister (JGU) Andre Brinkmann (JGU) Sascha Effert (christmann)

• Deduplication (short) • System Overview • Fault Tolerance • Inter-node Communication • Evaluation • Conclusion

Outline

• Storage savings approaches • Remove course-grained redundancy

• Process overview

1. Split data into chunks 2. Fingerprint chunks with

cryptographic hash 3. Check if fingerprint is already stored in

index (Chunk Index) 4. Store new chunk data (Storage) 5. Store block-chunk-mapping (Block Index)

Deduplication (short)

• Single-node deduplication systems – Limited scaling

• Just a Bunch of Deduplication Systems

– Lots of independent deduplication systems – Complex load balancing, migration, and management – Cross data set sharing

• Clustering promising to scale deduplication

Clustered Deduplication

• Term: “Exact deduplication” – Detect all duplicate chunks – Same deduplication as single-node system

• But larger and faster

• Goal: – Scalable, exact deduplication – Small chunk sizes (8 – 16 KB)

• Contribution

– Architecture combining these properties – Evaluation using prototype – Exploring limitations

Exact deduplication

System Design Overview

• Shared Nothing (Direct Attached) – Replication/erasure coding – Chunk data over network

• Shared Storage (SAN)

– Complex locking schemes – Scaling issues

• One partition is only accessed by single node

– No short-term locking – Sharing is used for

• Fault tolerance • Load balancing

Storage Organization

• Chunk Index Partitions • Block Index Partitions • Container Partitions • Container Metadata Partitions

• Partition types are handled differently

– Fault tolerance – Load balancing – Data assignment

Partition types

• Contains parts of distributed chunk index

• Chunk assigned by SHA prefix

• Performance sensitive – SSD Storage – 100,000s requests/s during writes – Multiple request issues concurrently – Index not updated directly

• Write-ahead log • Dirty uncommitted state in memory

Chunk Index Partition

• Contains the block index

• iSCSI target is assigned to partition • Only node the partition is assigned to

exports iSCSI target – Extension like clustered iSCSI could be implemented

• High locality of access

– Modest performance requirements – Currently on SSD storage

Block Index Partition

• Stored on HDD storage • Stored parts of container storage • Chunk data stored in container

• Assignment

– Prefer local partitions – One local partition for writing – Chunk data never transferred over network

• Container Metadata Partition

– Metadata, write-ahead log stored on SSD

Container Partition

• Storage reliability – RAID 6 (or double mirroring)

• Node crash (basic idea):

– Failure detection using e.g. keep-alive messages – Partition remapping by cluster leader

• New leadership election if cluster leader fails • Based on ZooKeeper system

– Recovery strategy depends on partition type

• Load Balancing – Strongly related to fault tolerance – Details in paper

Fault tolerance

• Chunk Index – Needs to be up very quickly No log replay – “False Negatives“ – State cleaned up during background log replay

• Block Index

– Out-dated results are not acceptable – Replay of write-ahead log to recover latest state – Downtime of around a minute

• Container Storage

– Only small non-committed state No issue

Recovery Strategy

• Main communication pattern – For all chunks in write request:

• Ask responsible node • Requests to same node aggregated • Each message is small (20 – 100 bytes)

Inter-node Communication

• Prototype implementation • 60 node cluster

– 24 as deduplication nodes – 24 for workload generation (clients) – 12 for shared storage simulation (6 SSD, 6 SAN) – Gigabit Ethernet Interconnect – IPoIB iSCSI to storage nodes

• Load generation

– Based on pattern/probability distributions from traces – “1. Generation”: Empty deduplication system – “2. Generation”: Later backup runs

Evaluation

1 Node 2 Nodes

4 Nodes

8 Nodes

16 Nodes

24 Nodes

1. Generation (Total)

2. Generation (Total)

1. Generation (per Node) 2. Generation (per Node)

Prototype throughput (in MB/s)

• Throughput relies on exchanging 100,000s of message/s • More nodes

– More write requests – Higher chunk lookup “fan-out” – Linear more ability to process messages – Sub-linear scaling

• But only for small chunk sizes, small cluster

– No fan-out for larger cluster – Scaling becomes more linear as cluster grows

Communication Limit

Expected number of messages

Communication Limits: Results

2 4 6 8

8 KB 16 KB 32 KB 64 KB 128 KB 256 KB

• Measured messages based on communication pattern

• Estimated throughput based message rates

• Performance excluding Storage, Chunking, … – Only communication

• Exact, inline deduplication cluster – Architecture – Prototype

• Exact deduplication clusters with small chunk sizes are possible

• However, message exchange limits scalability

Conclusion

THANK YOU Questions?

Architecture

• Chunk Index – Load balancing not necessary – SHA1 prefix ensures good distribution

• Block Index

– Load imbalance due to skewed access – Move partitions between nodes to balance load – Move volume mapping as a last resort

• Container Storage

– Load imbalance due to skewed read access – Move partitions between nodes to balance load

Load Balancing

• Current limit: Nodes ability to receive and process messages

• Network switch can become bottleneck – High-performance switches are surprisingly capable – Probably only in larger clusters – Not seen any slowdown based on network / switch

• Up to 24 nodes

Network Limit?

Distributed Chunk Index

• Provide SCSI target interface • Process incoming SCSI requests • Chunking and Fingerprinting

• Contain parts of indexes

– Chunk Index – Block Index

• Can directly access parts of stored chunk data

– Container Storage

Deduplication Nodes

Design of an Exact Data Deduplication...

Documents

Transcript of Design of an Exact Data Deduplication...

DAURUM: Deduplication & Fusion Deduplication & Fusion Robert Ventura Simon

Deduplication Guide

EMC VNX Deduplication and Compression deduplication and compression capabilities for file and ... With VNX deduplication and ompression, ... Multipurpose storage platform with storage

DeDuplication Employer Portal

SiLo: A Similarity-Locality based Near-Exact …SiLo: A Similarity-Locality based Near-Exact Deduplication Scheme with Low RAM Overhead and High Throughput Wen Xia † Hong Jiang ‡

Deduplication solutions are not all created equal, … · Deduplication solutions are not all created equal, ... Benefits of data domain for disaster recovery ... researching a deduplication

NETAPP DEDUPLICATION

NetApp Data Compression and Deduplication Deployment · PDF fileNetApp Data Compression and Deduplication Deployment and ... information on best practices, ... and Deduplication Deployment

Quantum DXi6800 Deduplication Appliance

DEDUPLICATION IN YAFFS

DATA DEDUPLICATION FUNDAMENTALS - Dell EMC … · – rainbow tables in deduplication ... • High-throughput deduplication storage ... Gartner called deduplication a transformational

761 Data Domain Deduplication

Феррера/Ferrera · 2 1 r1 f1 f1 f1 f1 f1 f1 f1 f1 r1 r1 r1 r1 r1 r1 r16 r16 r16 r16 r16 r16 r16 r16 r16 r16 1800 r16 1 2 3 4 6 5 9 8 8

StarWind Configuring Deduplication

Global deduplication for Ceph

HP StoreOnce Deduplication

EMC NetWorker Data Domain Deduplication Devices ...nsrd.info/documentation/nw8/NetWorker v8 Data Domain Deduplication...EMC® NetWorker® Data Domain® Deduplication Devices Release

CINEMA4D studio R16 CINEMA4D Visualize R16 CINEMA4D ... · CINEMA4D studio R16 CINEMA4D Visualize R16 CINEMA4D Broadcast RIB CINEMA4D NAxoN NAxoN 3D FOR THE REAL WORLD ÆfiQCinema4D

Data Deduplication .

Disambiguation deduplication wp-v4