More Efficient Object Replication in OpenStack Summit Juno

Copyright©2014 NTT corp. All Rights Reserved.

Developing More Efficient Object Replication on OpenStack Swift 2014/05/16 (OpenStack Juno Design Summit)

Kota Tsuyuzaki Developer (Swift ATC) Advanced Information Processing Technology SE Project NTT Software Innovation Center

Copyright(c)2009-2014 NTT CORPORATION. All Rights Reserved.

2 Copyright©2014 NTT corp. All Rights Reserved.

1. Global Distributed Cluster

2. More Efficient Object Replication

3. Benchmark Analysis

Etherpad:

https://etherpad.openstack.org/p/juno_swift_object_replication

Extra:

ssync issue

Outline





Demands:

•World Wide Services

•Capacity Optimization

•Disaster Recovery

Solution:

•Global Distributed Cluster



Network Issues:


・High Latency ・Narrow ・Expensive tens of ~ 100 ms 1～10Gbps $15000/Gbps/mo


Network Issues:


・High Latency Excellent

-> Regions

-> Affinity Controls

Region1 Region2

from SwiftStack Blog

https://swiftstack.com/blog/




Network Issues:


・Narrow ・Expensive Not So Enough

-> ???

-> ???

• Large Amounts of Transfer

• Replication Delay


Objective:

Reducing The Amounts of Replication Network Transfer between Regions

(focus on Narrow Network)




Current Behavior


Current:

Model: 2 Regions 3 Replicas with Write Affinity


Region1

Network between Regions

Region2

User

Internet

PUT object Primary

Handoff


Current:

Model: 2 Regions 3 Replicas with Write Affinity


Region1


Region2

User

Internet

Primary

Handoff

Unfortunately Copy Twice or More



Proposed Approach


Approach:

• Only push to one remote based on affinity

• Request to sync to others from the remote

• Change only few codes in object-replicator and object-server


Region1


Region2

Only push to one remote

Sync to others



*Additional code [Object-Replicator]

find local part suffixes for each: find other primary locations check remote if not in remote: if (remote region is local) or (remote region not in synced region): push data create remote suffix with request to sync in remote region add remote region to synced region

[Object-Server (REPLICATE)]

create local suffix if sync request in header: push data to requested remotes


Objective:

•Analyze Replication Performance

• Total transferred data amount

• Average network bandwidth between region

• One pass time

3. Performance Analysis


Model:

• 2 Regions 3 Replicas

• 1 Gate Way Node(GW) between Regions

Scenario:

• Shaping GW Network as 1Gbps

• Stop object-replicator

• Load objects with Write Affinity

• 1Gbps -> 8MB * 5,000 (40GB total)

• Run object-replicator with once mode (32 concurrency)

Benchmark Patterns: • Original (ssync)

• Proposed (ssync, rsync)

3. Benchmark Scenario


3. Benchmark Environment

Storage1 Storage2

Infiniband switch (LAN)

Region 1 Region 2

Proxy

x 36 x 36

Infiniband switch (LAN)

Storage3 Storage4

x 36 x 36

GW

20Gbps 20Gbps 20Gbps

(1G) 20Gbps 20Gbps

Client

Ethernet

Storage: CPU: 2 * Intel X5650 2.67GHz (6 core * HT) MEM: 48GB RAM NIC: 20Gbps Infiniband Disks: 3TB SATA (7,200 rpm) x 36 disks

GW: CPU: 2 * Intel X5650 2.67GHz (6 core * HT) MEM: 64GB RAM NIC: 2 * 20Gbps Infiniband (Shaping 1G)

20Gbps

(1G)


3. Result (w/1Gbps shaping)

0

100

200

300

400

500

600

Original Proposed (ssync) Proposed (rsync)

elap

sed

tim

e (s

ec)

One Replication Pass Time (1Gps)

0

10

20

30

40

50

60

70


Tran

sfe

rred

Dat

a A

mo

un

t (G

B)

Transferred Data on One Pass (1Gps)

0

0.2

0.4

0.6

0.8

1


Ave

rage

NEt

wo

rk B

and

wid

th

(Gb

ps)

Average Network Bandwidth (1Gps)

- Good Reduction in Transferred Data Amount

- Little decreasing appeared in Average

Network Bandwidth

- Good Reduction in One Pass Time

-- ssync is more efficient than rsync.

-- Proposed algorithm has small overhead with waiting node

syncing.

-- Enable to ensure sync all primary nodes with a shorter

time and smaller amount of data transfer.

Very Good!

Very Good!

Little decreasing 40GB * 3 replica / 2 = 60GB

1 / 3 has 2 copy in region2 40 GB = theoretical value



• Efficient Replication Needs


• Affinity based approach

• Only push to one remote

3. Benchmark Analysis

• Good reduction of data transfer

• Little overhead in One Pass Time

acknowledgment: Swiftstack members, Ken Igarachi, Yohei Hayashi, Takashi Shito, Hiromichi Ito, Naoto Nishizono

Conclusion


• Is ensuring syncing all nodes needed?

• Request to sync at that time of replicate:

• Pros: Able to ensure to sync all replica

• Cons: Little overhead to wait syncing

• Not to request to sync, update the replica asynchronously:

• Pros: To be simple

• Cons: Unable to ensure to sync all replica

• Good way to sync other nodes in Object-Server

• Naïve (but very simple): • Use object-replicator instance with unnecessary wasted

information. (e.g. Ring)

• Complex: • Create syncing function or class for object-server

• Are there more efficient ways?

Discussions

current

current


Kota Tsuyuzaki

IRC: Kota

[email protected]

mailto:[email protected]


Ssync:

• Replication process improvement based on HTTP

• Replacement of rsync (designed to be slimmer)

• Sender / Receiver Model

Issue:

• Performance of parallel i/o (might be) caused by evenlet

• Disable to access local disk in parallel (maybe, by constraint of Python VM)

• Slower than rsync in my experiment

• Possible Solution: • Launch sender as subprocess to allow using another CPU core for

disk read similar with rsync.

• When using os.fork(), performance became better to around same as rsync.

Extra: Ssync issue

More Efficient Object Replication in OpenStack Summit Juno

Technology

Transcript of More Efficient Object Replication in OpenStack Summit Juno