(Dis)Advantages of DHT: A Perspective with Raghavendra Gowdappa
-
Upload
glusterorg -
Category
Technology
-
view
249 -
download
1
Transcript of (Dis)Advantages of DHT: A Perspective with Raghavendra Gowdappa
(Dis)Advantages of DHT: A perspectiveRaghavendra Gowdappa Principal Software Engineer, Red Hat Inc
Pain points
Replicated directory structure on all subvols
Need transactions for consistent directory structure across subvolsCurrent status - Solution using locks, WIP
Cost O(1)Crash consistency is a challenge
Need path information to “heal” directory structure during add-brick scenarios. Usecase - GNfsCurrent status - Fixed
Readdir has to read lot of redundant dentriesOne of the prime suspects for poor readdir perf. See
“readdir-optimize”
Replicated directory structure..Directory ops have to visit all nodes
All subvols have to be online for rmdir, renamedir to succeedMkdir can complete with just hashed subvol onlineOps are NOT serialized across subvols
Pain points (contd)
Lack of Metadata server !!! Metadata like permissions, user xattrs, posix
locks on directoriesReplication is difficult and complexHowever, there is an MDS - hashed subvol
What if its down?
Pain points (contd)
Directory is spread on all subvolsFor reading an entire directory, all subvols need
to be visitedSupport for rewinddir and need for serialization of readdir
across subvolsSerialized readdir is a bottleneck on scalability. Eg., listing of
empty directory on large volumes
Pain points (contd)
Stale in-memory directory layoutsDirectory heal by clientsFix layout by rebalance process(es)
Each rebalance process arriving at a different layout due to layout optimization
Stale layouts...
Two possible solutionsSingle global layout
Status - pocIndependent layouts with synchronization
during layout modification and consumptionCost - O(1) for layout readers. O(subvols) for layout writersBack off from healing if not necessary.
Only a fix-layout will override a “well-formed” layoutA fix-layout won’t alter a layout that is spread across all subvolsStatus - solution present in codebase
A fix-layout can still override a well formed layoutStale layout issues seen during rebalance/fix-layoutAlgo to check cache-correctness in all dentry fops that consume layoutStatus - partially implemented. Cost - O(1) for happy case
Pain points (contd)
Brick full scenario - Is distribution really transparent?
Min-free-disk by itself is not sufficientSolutions
ShardingOn the fly migration during writes
Pain points (contd)
Readdir and rebalanceDuplicate listing, missing files in readdir listing
during rebalance
Rebalance speedNo estimation for completion timeLack of granularityMissing directories in readdir listing during
rebalanceHealing of directories and data movement happens together(sub)Directories not present on hashed subvol after fix
layout on parent till entire subtree is migrated.
Are there any advantages at all?
Has been in production since yearsDistributed work load for dentry ops for files, readdir
Colocation of data and metadata for files
No single point of failure. Conditions apply :)
Can support really large directories.
(advantages…)
If some subvols are down during a directory creation, files don’t go into them
Good for permanently down subvolsBad for transient failures
Presence of directory structure on all subvols make implementation of features like quota trivial
One can just mount a subvol of dht as an independent volume
Thank you