More Efficient Object Replication in OpenStack Summit Juno
-
Upload
kota-tsuyuzaki -
Category
Technology
-
view
676 -
download
0
description
Transcript of More Efficient Object Replication in OpenStack Summit Juno
Copyright©2014 NTT corp. All Rights Reserved.
Developing More Efficient Object Replication on OpenStack Swift 2014/05/16 (OpenStack Juno Design Summit)
Kota Tsuyuzaki Developer (Swift ATC) Advanced Information Processing Technology SE Project NTT Software Innovation Center
Copyright(c)2009-2014 NTT CORPORATION. All Rights Reserved.
2 Copyright©2014 NTT corp. All Rights Reserved.
1. Global Distributed Cluster
2. More Efficient Object Replication
3. Benchmark Analysis
Etherpad:
https://etherpad.openstack.org/p/juno_swift_object_replication
Extra:
ssync issue
Outline
3 Copyright©2014 NTT corp. All Rights Reserved.
Demands:
•World Wide Services
•Capacity Optimization
•Disaster Recovery
Solution:
•Global Distributed Cluster
1. Global Distributed Cluster
4 Copyright©2014 NTT corp. All Rights Reserved.
Network Issues:
1. Global Distributed Cluster
・High Latency ・Narrow ・Expensive tens of ~ 100 ms 1~10Gbps $15000/Gbps/mo
5 Copyright©2014 NTT corp. All Rights Reserved.
Network Issues:
1. Global Distributed Cluster
・High Latency Excellent
-> Regions
-> Affinity Controls
Region1 Region2
from SwiftStack Blog
https://swiftstack.com/blog/
6 Copyright©2014 NTT corp. All Rights Reserved.
Network Issues:
1. Global Distributed Cluster
・Narrow ・Expensive Not So Enough
-> ???
-> ???
• Large Amounts of Transfer
• Replication Delay
7 Copyright©2014 NTT corp. All Rights Reserved.
Objective:
Reducing The Amounts of Replication Network Transfer between Regions
(focus on Narrow Network)
2. More Efficient Object Replication
8 Copyright©2014 NTT corp. All Rights Reserved.
2. More Efficient Object Replication
Current Behavior
9 Copyright©2014 NTT corp. All Rights Reserved.
Current:
Model: 2 Regions 3 Replicas with Write Affinity
2. More Efficient Object Replication
Region1
Network between Regions
Region2
User
Internet
PUT object Primary
Handoff
10 Copyright©2014 NTT corp. All Rights Reserved.
Current:
Model: 2 Regions 3 Replicas with Write Affinity
2. More Efficient Object Replication
Region1
Network between Regions
Region2
User
Internet
Primary
Handoff
Unfortunately Copy Twice or More
11 Copyright©2014 NTT corp. All Rights Reserved.
2. More Efficient Object Replication
Proposed Approach
12 Copyright©2014 NTT corp. All Rights Reserved.
Approach:
• Only push to one remote based on affinity
• Request to sync to others from the remote
• Change only few codes in object-replicator and object-server
2. More Efficient Object Replication
Region1
Network between Regions
Region2
Only push to one remote
Sync to others
13 Copyright©2014 NTT corp. All Rights Reserved.
2. More Efficient Object Replication
*Additional code [Object-Replicator]
find local part suffixes for each: find other primary locations check remote if not in remote: if (remote region is local) or (remote region not in synced region): push data create remote suffix with request to sync in remote region add remote region to synced region
[Object-Server (REPLICATE)]
create local suffix if sync request in header: push data to requested remotes
14 Copyright©2014 NTT corp. All Rights Reserved.
Objective:
•Analyze Replication Performance
• Total transferred data amount
• Average network bandwidth between region
• One pass time
3. Performance Analysis
15 Copyright©2014 NTT corp. All Rights Reserved.
Model:
• 2 Regions 3 Replicas
• 1 Gate Way Node(GW) between Regions
Scenario:
• Shaping GW Network as 1Gbps
• Stop object-replicator
• Load objects with Write Affinity
• 1Gbps -> 8MB * 5,000 (40GB total)
• Run object-replicator with once mode (32 concurrency)
Benchmark Patterns: • Original (ssync)
• Proposed (ssync, rsync)
3. Benchmark Scenario
16 Copyright©2014 NTT corp. All Rights Reserved.
3. Benchmark Environment
Storage1 Storage2
Infiniband switch (LAN)
Region 1 Region 2
Proxy
x 36 x 36
Infiniband switch (LAN)
Storage3 Storage4
x 36 x 36
GW
20Gbps 20Gbps 20Gbps
(1G) 20Gbps 20Gbps
Client
Ethernet
Storage: CPU: 2 * Intel X5650 2.67GHz (6 core * HT) MEM: 48GB RAM NIC: 20Gbps Infiniband Disks: 3TB SATA (7,200 rpm) x 36 disks
GW: CPU: 2 * Intel X5650 2.67GHz (6 core * HT) MEM: 64GB RAM NIC: 2 * 20Gbps Infiniband (Shaping 1G)
20Gbps
(1G)
17 Copyright©2014 NTT corp. All Rights Reserved.
3. Result (w/1Gbps shaping)
0
100
200
300
400
500
600
Original Proposed (ssync) Proposed (rsync)
elap
sed
tim
e (s
ec)
One Replication Pass Time (1Gps)
0
10
20
30
40
50
60
70
Original Proposed (ssync) Proposed (rsync)
Tran
sfe
rred
Dat
a A
mo
un
t (G
B)
Transferred Data on One Pass (1Gps)
0
0.2
0.4
0.6
0.8
1
Original Proposed (ssync) Proposed (rsync)
Ave
rage
NEt
wo
rk B
and
wid
th
(Gb
ps)
Average Network Bandwidth (1Gps)
- Good Reduction in Transferred Data Amount
- Little decreasing appeared in Average
Network Bandwidth
- Good Reduction in One Pass Time
-- ssync is more efficient than rsync.
-- Proposed algorithm has small overhead with waiting node
syncing.
-- Enable to ensure sync all primary nodes with a shorter
time and smaller amount of data transfer.
Very Good!
Very Good!
Little decreasing 40GB * 3 replica / 2 = 60GB
1 / 3 has 2 copy in region2 40 GB = theoretical value
18 Copyright©2014 NTT corp. All Rights Reserved.
1. Global Distributed Cluster
• Efficient Replication Needs
2. More Efficient Object Replication
• Affinity based approach
• Only push to one remote
3. Benchmark Analysis
• Good reduction of data transfer
• Little overhead in One Pass Time
acknowledgment: Swiftstack members, Ken Igarachi, Yohei Hayashi, Takashi Shito, Hiromichi Ito, Naoto Nishizono
Conclusion
19 Copyright©2014 NTT corp. All Rights Reserved.
• Is ensuring syncing all nodes needed?
• Request to sync at that time of replicate:
• Pros: Able to ensure to sync all replica
• Cons: Little overhead to wait syncing
• Not to request to sync, update the replica asynchronously:
• Pros: To be simple
• Cons: Unable to ensure to sync all replica
• Good way to sync other nodes in Object-Server
• Naïve (but very simple): • Use object-replicator instance with unnecessary wasted
information. (e.g. Ring)
• Complex: • Create syncing function or class for object-server
• Are there more efficient ways?
Discussions
current
current
20 Copyright©2014 NTT corp. All Rights Reserved.
Kota Tsuyuzaki
IRC: Kota
21 Copyright©2014 NTT corp. All Rights Reserved.
Ssync:
• Replication process improvement based on HTTP
• Replacement of rsync (designed to be slimmer)
• Sender / Receiver Model
Issue:
• Performance of parallel i/o (might be) caused by evenlet
• Disable to access local disk in parallel (maybe, by constraint of Python VM)
• Slower than rsync in my experiment
• Possible Solution: • Launch sender as subprocess to allow using another CPU core for
disk read similar with rsync.
• When using os.fork(), performance became better to around same as rsync.
Extra: Ssync issue