NetRAID for the Linux Kernel (UKUUG/LISA WCHAR 2004)

15
NetRAID Peter T. Breuer [email protected]

description

Slides for presentation of "NetRAID for the Linux Kernel" at UKUUG LISA/Winter Conference on High-Availability and Reliability, Feb. 2004. The preprint for the full article is at http://www.academia.edu/2493525/NetRAID_for_the_Linux_Kernel .

Transcript of NetRAID for the Linux Kernel (UKUUG/LISA WCHAR 2004)

Page 1: NetRAID for the Linux Kernel  (UKUUG/LISA WCHAR 2004)

NetRAID

Peter T. [email protected]

Page 2: NetRAID for the Linux Kernel  (UKUUG/LISA WCHAR 2004)

Failover Loves Mirroring

12.2.1.3 12.2.1.3

raid1

nbd

disk

RAID1+ NBD

Page 3: NetRAID for the Linux Kernel  (UKUUG/LISA WCHAR 2004)

Resynchronization Snooze

12.2.1.3ZZZ...

raid1

nbd

Page 4: NetRAID for the Linux Kernel  (UKUUG/LISA WCHAR 2004)

The Numbers Game

100BT LAN = 10MB/s

1TB mirror @ 40MB/s = 25000s

7.5 hours!

Temporary network outages = frequent permanent disk losses = infrequent

Adds up to a need for a changed paradigm

Page 5: NetRAID for the Linux Kernel  (UKUUG/LISA WCHAR 2004)

What's wrong with ordnary RAID1?

Full resync too slow over the net

Net dropouts too frequent

trigger full resync

Does not expect same disk to be restored

Network glitches are cable errors

Requires presencial administration!

Writes synchronous to both sides - too slow

Reads may be from the slow side

Page 6: NetRAID for the Linux Kernel  (UKUUG/LISA WCHAR 2004)

RAID vs netRAID

Classical

small disks

physically close

medium bandwidth

infrequent dropouts

permanent losses

admin on hand

netRAID

large disks

physically dispersed

low bandwidth

frequent dropouts

temporary losses

admin off-scene

Page 7: NetRAID for the Linux Kernel  (UKUUG/LISA WCHAR 2004)

Solutions

Replace drivers

Linux kernel NBD → ENBD

Linux kernel RAID1 → FR1

Replace problems

disk fail is permanent → disk fail is temporary

repair by insert new disk → repair by reinsert old

admin does repair → device repairs itself

cables never fail → cables often fail

Page 8: NetRAID for the Linux Kernel  (UKUUG/LISA WCHAR 2004)

ENBD

automatic reconnect after network outage

block not error during temporary outage

redundant channel connectivity

(partitionable)

accelerated - skips writes equal both sides

talks to soft RAID overlay driver

supports remote ioctls and removable devices

Page 9: NetRAID for the Linux Kernel  (UKUUG/LISA WCHAR 2004)

FR1

full resync → intelligent partial resync

hot repair

automatic

asynchronous

writes eliminate latency

read from fastest (not there yet)

retain state across reboots (Paul Clements)

Page 10: NetRAID for the Linux Kernel  (UKUUG/LISA WCHAR 2004)

FR1 intelligent resync

● resync max40MB/s

Page 11: NetRAID for the Linux Kernel  (UKUUG/LISA WCHAR 2004)

ENBD performance measure (read)

● n=1,2,4channels

Page 12: NetRAID for the Linux Kernel  (UKUUG/LISA WCHAR 2004)

ENBD performance measure (write)

● n=1,2,4channels

Page 13: NetRAID for the Linux Kernel  (UKUUG/LISA WCHAR 2004)

netRAID1 nuances

With mirrored journal

must preserve write ordering!

immediate takeover - no fsck!

Without

3x faster!

needs fsck

Detecting failure

private or public connectivity test?

12.2.1.3

Page 14: NetRAID for the Linux Kernel  (UKUUG/LISA WCHAR 2004)

Summary

Component-based assembly

ENBD - remote network disk

FR1 - Fast RAIDneFS - any file system

easier to parcel out development

more testing

easier to slip part supports into kernel

FS agnostic

Work together for replication, failover, recovery

Page 15: NetRAID for the Linux Kernel  (UKUUG/LISA WCHAR 2004)

\thebilbliography

● Paul Clements & James E.J. Bottomley. High Availability Data Replication. Proc. Linux Symposium July 2003 Ottawa, Ontario, Canada. http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-Clements-OLS2003.pdf

● P.T. Breuer et al. The Network Block Device http://www2.linuxjournal.com/lj-issues/issue73/3778.html