(Dis)Advantages of DHT: A Perspective with Raghavendra Gowdappa

12
(Dis)Advantages of DHT: A perspective Raghavendra Gowdappa Principal Software Engineer, Red Hat Inc

Transcript of (Dis)Advantages of DHT: A Perspective with Raghavendra Gowdappa

Page 1: (Dis)Advantages of DHT: A Perspective with Raghavendra Gowdappa

(Dis)Advantages of DHT: A perspectiveRaghavendra Gowdappa Principal Software Engineer, Red Hat Inc

Page 2: (Dis)Advantages of DHT: A Perspective with Raghavendra Gowdappa

Pain points

Replicated directory structure on all subvols

Need transactions for consistent directory structure across subvolsCurrent status - Solution using locks, WIP

Cost O(1)Crash consistency is a challenge

Need path information to “heal” directory structure during add-brick scenarios. Usecase - GNfsCurrent status - Fixed

Readdir has to read lot of redundant dentriesOne of the prime suspects for poor readdir perf. See

“readdir-optimize”

Jeffrey Darcy
Might be safer to say crash consistency is a challenge (or something like that) instead of saying it's not present.
Page 3: (Dis)Advantages of DHT: A Perspective with Raghavendra Gowdappa

Replicated directory structure..Directory ops have to visit all nodes

All subvols have to be online for rmdir, renamedir to succeedMkdir can complete with just hashed subvol onlineOps are NOT serialized across subvols

Page 4: (Dis)Advantages of DHT: A Perspective with Raghavendra Gowdappa

Pain points (contd)

Lack of Metadata server !!! Metadata like permissions, user xattrs, posix

locks on directoriesReplication is difficult and complexHowever, there is an MDS - hashed subvol

What if its down?

Page 5: (Dis)Advantages of DHT: A Perspective with Raghavendra Gowdappa

Pain points (contd)

Directory is spread on all subvolsFor reading an entire directory, all subvols need

to be visitedSupport for rewinddir and need for serialization of readdir

across subvolsSerialized readdir is a bottleneck on scalability. Eg., listing of

empty directory on large volumes

Page 6: (Dis)Advantages of DHT: A Perspective with Raghavendra Gowdappa

Pain points (contd)

Stale in-memory directory layoutsDirectory heal by clientsFix layout by rebalance process(es)

Each rebalance process arriving at a different layout due to layout optimization

Jeffrey Darcy
It would probably look better to un-indent "Two possible solutions" back to the top level. Also, this might be a bit more detail than the audience can absorb (and the smallest lines will be hard to read on a projector).
Raghavendra Gowdappa
Sure. will do that
Page 7: (Dis)Advantages of DHT: A Perspective with Raghavendra Gowdappa

Stale layouts...

Two possible solutionsSingle global layout

Status - pocIndependent layouts with synchronization

during layout modification and consumptionCost - O(1) for layout readers. O(subvols) for layout writersBack off from healing if not necessary.

Only a fix-layout will override a “well-formed” layoutA fix-layout won’t alter a layout that is spread across all subvolsStatus - solution present in codebase

A fix-layout can still override a well formed layoutStale layout issues seen during rebalance/fix-layoutAlgo to check cache-correctness in all dentry fops that consume layoutStatus - partially implemented. Cost - O(1) for happy case

Page 8: (Dis)Advantages of DHT: A Perspective with Raghavendra Gowdappa

Pain points (contd)

Brick full scenario - Is distribution really transparent?

Min-free-disk by itself is not sufficientSolutions

ShardingOn the fly migration during writes

Page 9: (Dis)Advantages of DHT: A Perspective with Raghavendra Gowdappa

Pain points (contd)

Readdir and rebalanceDuplicate listing, missing files in readdir listing

during rebalance

Rebalance speedNo estimation for completion timeLack of granularityMissing directories in readdir listing during

rebalanceHealing of directories and data movement happens together(sub)Directories not present on hashed subvol after fix

layout on parent till entire subtree is migrated.

Page 10: (Dis)Advantages of DHT: A Perspective with Raghavendra Gowdappa

Are there any advantages at all?

Has been in production since yearsDistributed work load for dentry ops for files, readdir

Colocation of data and metadata for files

No single point of failure. Conditions apply :)

Can support really large directories.

Page 11: (Dis)Advantages of DHT: A Perspective with Raghavendra Gowdappa

(advantages…)

If some subvols are down during a directory creation, files don’t go into them

Good for permanently down subvolsBad for transient failures

Presence of directory structure on all subvols make implementation of features like quota trivial

One can just mount a subvol of dht as an independent volume

Vijay Bellur
dht1 allows retaining the familiar directory hierarchy. Lot of folks find it re-assuring - we can call that out as an advantage. So is running in production for several years a big advantage :).
Vijay Bellur
Consider using Speaker Notes instead of more text in the slides. That can help provide you context.
Page 12: (Dis)Advantages of DHT: A Perspective with Raghavendra Gowdappa

Thank you