Line Networks with Erasure Codes and Network Coding by Yang Song A
OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different...
Transcript of OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different...
![Page 1: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/1.jpg)
OFVWG: Erasure Coding RDMA Offload
Sagi Grimberg
![Page 2: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/2.jpg)
Problem Statement
• Modern storage arrays are usually distributed in a clustered environment.
• Problem: Disks and/or nodes inevitably tend to fail.
– How can we survive failures and keep our data intact?
OFVWG 2
![Page 3: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/3.jpg)
RAID 1 (Replication)
• Instead of storing the data once, we will store more copies of the data on another disk/node.
• If a disk/node fail, we are able to still recover the data.
• If we want to survive X failures, we need to replicate X instances of the data.
OFVWG 3
![Page 4: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/4.jpg)
RAID 1 pros/cons
• Pros: – Simple to do – No need for extra computation – No need for reconstruct logic
• Cons: – Requires a high storage space for redundancy – Inefficient wire utilization
OFVWG 4
![Page 5: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/5.jpg)
RAID 5 (single parity block)
• We divide our data into X blocks and calculate a single parity block and store it as well.
• If any of the drives fail we can reconstruct the
original data back from the parity block.
OFVWG 5
![Page 6: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/6.jpg)
RAID 5 pros/cons
• Pros: – Efficient storage utilization (small storage space for
redundancy) – Efficient wire utilization
• Cons: – Requires computation to generate the parity block – Requires computation to reconstruct the original data – Need multi-level RAID to survive more than a single
failure.
OFVWG 6
![Page 7: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/7.jpg)
RAID 6 (dual parity block)
• We divide our data into X blocks and calculate two parity block and store them as well.
• If any two drives/nodes fail we can reconstruct
the original data back from the parity blocks.
OFVWG 7
![Page 8: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/8.jpg)
RAID 6 pros/cons
• Pros: – Efficient storage utilization (small storage space for
redundancy) – Efficient wire utilization
• Cons: – Requires computation to generate two parity blocks – Requires computation to reconstruct the original data – Need multi-level RAID to survive more than two
failures.
OFVWG 8
![Page 9: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/9.jpg)
Erasure coding (generalize RAID)
• There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes).
• The mathematical approach is to use higher
rank polynomials over Galois finite fields GF(2^w) in order to use minimum storage for K number of disk/node failures.
• Codes can be systematic (raw data is stored) or non-systematic (data projections are stored).
OFVWG 9
![Page 10: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/10.jpg)
Erasure coding (generalize RAID)
• Erasure codes allows us to survive M failures for any K data blocks where: K+M≤2↑𝑤
• For example if we use 𝐺𝐹( 2↑4 ) and we want to survive 4 disk failures we can protect 12 data blocks. – This means we only spend 33.3% of storage to store
redundancy metadata.
OFVWG 10
![Page 11: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/11.jpg)
Erasure coding Illustration
OFVWG 11
![Page 12: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/12.jpg)
Erasure coding Decode Illustration
OFVWG 12
![Page 13: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/13.jpg)
Erasure coding Decode Illustration
OFVWG 13
1.
2.
![Page 14: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/14.jpg)
Erasure coding pros/cons
• Pros: – *Very* Efficient storage utilization (small storage space for
redundancy) – *Very* Efficient wire utilization – User can choose his configuration (K,M) – no need for multi-level
RAID.
• Cons: – Large computation overhead needed to generate the
redundancy metadata blocks – Large computation overhead needed to reconstruct the original
data
OFVWG 14
![Page 15: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/15.jpg)
RDMA Erasure coding offload
• Erasure codes calculations is CPU intensive.
• Next generation HCAs can offer a calculation engine.
• These HCAs can also offer a coherent calculation and networking solutions.
OFVWG 15
![Page 16: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/16.jpg)
Programming model - SW
OFVWG 16
![Page 17: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/17.jpg)
Programming model - Synchronous
OFVWG 17
![Page 18: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/18.jpg)
Programming model - Asynchronous
OFVWG 18
![Page 19: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/19.jpg)
Programming model – Full striping
OFVWG 19
![Page 20: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/20.jpg)
API – Erasure coding context • EC context verbs representation
• Allocation/Deallocation API
OFVWG 20
![Page 21: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/21.jpg)
API – EC init attributes
OFVWG 21
![Page 22: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/22.jpg)
API – EC memory layout
OFVWG 22
![Page 23: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/23.jpg)
API – Synchronous Encode
OFVWG 23
![Page 24: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/24.jpg)
API – Asynchronous Encode
OFVWG 24
![Page 25: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/25.jpg)
API – Asynchronous Encode
OFVWG 25
![Page 26: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/26.jpg)
API – Verbs stripe object
• In order to perform the full striping operation via a single API call we need to provide our strping layout (who gets what)
OFVWG 26
![Page 27: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/27.jpg)
API – Encode + Transfer
OFVWG 27
![Page 28: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/28.jpg)
API – Synchronous Decode
OFVWG 28
![Page 29: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/29.jpg)
API – Asynchronous Decode
OFVWG 29
• Pretty much the same idea
![Page 30: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea23631e3f44b2ca033f4ce/html5/thumbnails/30.jpg)
OFVWG
Thank You