High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya...
-
Upload
justin-denslow -
Category
Documents
-
view
219 -
download
0
Transcript of High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya...
![Page 1: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/1.jpg)
High Speed Total Order for SAN infrastructure
Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman
School of Engineering and Computer Science
The Hebrew University of Jerusalem
![Page 2: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/2.jpg)
Storage Area Network (SAN)
SAN is a high-speed special purpose subnetwork of shared storage devices.
SAN makes all storage devices available
to all servers on a LAN or WAN.
![Page 3: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/3.jpg)
SAN
From Microsoft web site
![Page 4: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/4.jpg)
SAN services
Disk mirroringBackup and restore Archival and retrieval of archived data Data migration Fault toleranceData sharing among different servers
![Page 5: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/5.jpg)
Fault Tolerance by State Machine Replication
An internal server is “cloned” (replicated)Each replica maintains its own stateThe states of replicas are kept consistent by
applying transactions to all the replicas in the same order
The order of transactions is established by a total order algorithm
![Page 6: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/6.jpg)
Types of Total Order Algorithms
Symmetric (Lamport 78) Messages are ordered based on a logical
timestamp and sender’s id
Leader-based (Sequencer) All messages are ordered by a special process
called sequencer
![Page 7: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/7.jpg)
Data Sharing and Locking
In order to keep shared files consistent, a lock mechanism is to be implemented
taking into account that: locking is a frequent event in SAN SAN is expected to provide high
throughput lock requests should be served promptly
(low latency)
![Page 8: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/8.jpg)
Hardware Solution
Low latency with high throughput is achievable by means of special hardware
Building a novel hardware, however, is expensive and time-consuming
Is it possible to achieve the goal using
off-the-shelf hardware?
![Page 9: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/9.jpg)
Network Message Order
Messages are ordered by the networkThe order may be not the same for different
replicas due to: Message losses Internal loopback
Neutralizing measures: A flow control mechanism to minimize message losses. Disabling loopback (but how to discover the order of
my own messages without loopback?)
![Page 10: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/10.jpg)
Network Order
m1 m2
m2 m2
m2m1
m1 m2
m1
m2
m1m2
m1
m1
m2m1
P1 P2 P3
![Page 11: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/11.jpg)
The solution is Virtual Sequencer
Each machine is equipped with TWO Network Interface Cards (NIC1 and NIC2)
Data is sent via NIC1Data is received via NIC2NIC1 cards are connected to switch ANIC2 cards are connected to switch BSwitch A and the link from it to switch B
serve as Virtual Sequencer
![Page 12: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/12.jpg)
Two-Switch Network
m2
m1
m2 m1 m2 m1 m1m2
m1
m1
m1
m2
m2
m2
Switch A Switch B
P3
P2
P1
![Page 13: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/13.jpg)
Implementation Issues
Recommended Switch Features IGMP snooping (highly recommended) Traffic shape (recommended) Jumbo frames (optional)
Required OS Feature Ability to disable Reverse Path Forward (RPF)
checking
![Page 14: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/14.jpg)
Test Bed
5 Pentium 500 Mhz machines with
32 bit x 33 Mhz PCI busEach machine is running Debian
GNU/Linux 2.4.25Each machine is equipped with 2 Intel
Pro1000MT Desktop Adapters 2 Dell 6024 switches
![Page 15: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/15.jpg)
Order Guarantees
Preliminary Order (PO) PO is guessed on the bases off the network
order by each a machine and can be changed later
Uniform Total Order (UTO) UTO is never changed (even for faulty machines) UTO is established by collecting
acknowledgments from all the machines
![Page 16: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/16.jpg)
All-to-All Experiment
Number ofMachines
Throughput (Mb/s)
PO Latency (ms)
UTO Latency (ms)
3 361.7 2.183 4.163
4 375.3 3.398 5.362
5 383.9 3.410 6.782
Attention: PCI bus is a real bottleneck!
![Page 17: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/17.jpg)
Disjoint sets of senders and receivers
Receivers
Senders 1 2 3 4
1 505.9 495.4 489.2 476.4
2 518.3 504.8 493.7
3 519.5 505.9
4 519.8
Attention: Scalable in number of senders.
Less scalable in number of receivers due to increase in ACKs number.
![Page 18: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/18.jpg)
Latency vs. Throughput (All-to-All)
![Page 19: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/19.jpg)
Latency vs. Throughput (disjoint sets of senders and receivers)
![Page 20: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/20.jpg)
Locks and Preliminary Order
Lock requests may be served faster based on preliminary order
Each server may rejected/accepted a lock according to Preliminary Order
Only iff the request order has been guessed correctly by a server, the response from the server is taken into account
![Page 21: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/21.jpg)
Packet Aggregation
In the experiments, results were obtained for MTU-size messages, however
Lock requests are usually small messagesA solution implementing simple
aggregation of lock requests will increase the locking mechanism latency
To keep the latency low, a modification of Nagle algorithm was used to perform “smart” packet aggregation.
![Page 22: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/22.jpg)
Packet Aggregation Results
![Page 23: High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.](https://reader036.fdocuments.net/reader036/viewer/2022062320/56649c7b5503460f9492e79e/html5/thumbnails/23.jpg)
Future WorkObtaining PatentApplication Development:
Games Grid Computation
Using NICs with CPU to: implement zero-copy driver send and aggregate acknowledgments by NIC
Checking scalability