VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster...

25
VastSky VastSky Cluster storage system for XCP Cluster storage system for XCP Apr. 28th, 2010 VA Linux Systems Japan K.K. Hirokazu Takahashi Tomoaki Sato Takashi Yamamoto Xen Summit AMD 2010 CONFIDENTIAL

Transcript of VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster...

Page 1: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

VastSkyVastSkyCluster storage system for XCPCluster storage system for XCP

Apr. 28th, 2010VA Linux Systems Japan K.K.y p

Hirokazu TakahashiTomoaki Sato

Takashi Yamamoto

Xen Summit AMD 2010 CONFIDENTIAL

Page 2: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

What is VastSky all about?

VastSky is a cluster storage system made up of a lot ofVastSky is a cluster storage system made up of a lot of servers and disks, from which VastSky Manager creates logical volumes for VMsg

VMs can directly run on VastSky, which XCP can control

VastSky is scalable high availabile and has a goodVastSky is scalable, high availabile and has a good performance

CONFIDENTIALXen Summit AMD 2010

Page 3: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

XCPXCP

VM VM VM VM VM VMVM VM VM VM VM VM

VastSkycontrolserver agentagent

logical volumesManager

controlgg

agentagent

storage poolserver agent

Xen Summit AMD 2010 CONFIDENTIAL

Page 4: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

Announcement

The code of VastSky has become open atThe code of VastSky has become open at http://sf.net/projects/vastsky/

Some more work is needed to be done before the first releaseSome more work is needed to be done before the first release.

CONFIDENTIALXen Summit AMD 2010

Page 5: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

Basic Design

A logical volume is a set of several mirrored disks, each ofA logical volume is a set of several mirrored disks, each of which consists of several physical disk chunks on different servers.

The logical volume won’t lose its data whether a physical disk or a

storage server in the storage pool has broken.

All I/O requests, including read, write and even re-synchronizing requests of mirrored devices will be distributed to all the physical

di kdisks.

CONFIDENTIALXen Summit AMD 2010

Page 6: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

The way of making a logical volumestorage pool (physical disks)logical volume

The way of making a logical volume

erve

rse

serv

er

Xen Summit AMD 2010 CONFIDENTIAL

Page 7: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

Good performance

All I/O operations will be done in the linux kernel without any VastSky Manager interactions.

I/O loads of logical volumes, which can be extremely b l d ill b li d b h h i l di kunbalanced, will be equalized between the physical disks.

I/O requests to rebuild mirrored devices are also distributed l t f h i l di kacross a lot of physical disks.

CONFIDENTIALXen Summit AMD 2010

Page 8: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

Load balancing of read/write requestsstorage pool (physical disks)logical volumes

Load balancing of read/write requests

read/write

read/write

read/writeread/write

read/write

Xen Summit AMD 2010 CONFIDENTIAL

Page 9: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

Load balancing of read/write requestsstorage pool (physical disks)logical volumes

Load balancing of read/write requests

read/write

read/write

read/write

read/write

Xen Summit AMD 2010 CONFIDENTIAL

Page 10: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

Mirrored disk recovery

Each mirrored disk doesn’t have its own spare disk.

When VastSky detects one of the physical disk chunks of the mirrored disk has caused an error, VastSky Manager ll h k f h l d i dallocates a new chunk form the storage pool and assigned

it to the mirrored disk as a spare.The manager schedules when it should be assinged so two orThe manager schedules when it should be assinged, so two or more re-sync operations won’t work on the same physical disk.

The mirrored disk starts re-synchronizing the disk chunks y gright after the spare is assigned.

CONFIDENTIALXen Summit AMD 2010

Page 11: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

Mirrored disk recovery

i d di k ( f l i l l )mirrored disk (a part of a logical volume)

3.assign the chunk as a spare3.delete the failure disk

physical disk chunksBROKEN physical disk chunksBROKEN

1. detect the error 2. allocate a chunk

storage pool

CONFIDENTIALXen Summit AMD 2010

Page 12: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

Load balancing when re-synchronizingLoad balancing when re synchronizing the mirrored devices

When a certain physical disk gets broken, VastSky tries to rebuild the mirrored disks related to the physical disk simultaneously since the disk chunks belong to different mirrored disks.

No need to rebuild if the disk chunk is unsed.

CONFIDENTIALXen Summit AMD 2010

Page 13: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

Load balancing when re-synchronizing

logical volumes storage pool (physical disks)the mirrored devices

spare

sparecrashed

spare

Xen Summit AMD 2010 CONFIDENTIAL

Page 14: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

Scalable

Each volume can handle its I/O operations independently.VastSky Manager doesn’t care about it.

Servers can be added to the system dynamically.y y y

CONFIDENTIALXen Summit AMD 2010

Page 15: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

How to setup

VastSky should be installed with VM management y gsoftware such as XCP to take care about VM life-cycle.

Networking redundancy should be implemented outside VastSky, such as using a bonding device.

Hardware health check should also work outside VastSky and hopefully it can tell VastSky which server or disk should be removed.

Th t i l t ti f V tSk i HAThe current implementation of VastSky requires HA cluster software to detect its manager down to be restartedrestarted.

CONFIDENTIALXen Summit AMD 2010

Page 16: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

API

VastSky supports XML-RPC interface and CUI like:Define a logical volume.

Attach the logical volume on a specified server.

D h h lDetach the volume.

Notify which disk or server has gone.

Add h i l di kAdd a new server or a physical disk.

Delete the server or the physical disk.

CONFIDENTIALXen Summit AMD 2010

Page 17: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

ToDo (1)

XCP integrationUnder development.

Improve scalability.Network topology aware volume allocation. When creating a new logical volume, physical disk chunks should be allocated from storage servers close to the server that owns the logical volumestorage servers close to the server that owns the logical volume.

Logical volume expansion feature.

S h t f t f G t lSnapshot feature for Guest volumes.

CONFIDENTIALXen Summit AMD 2010

Page 18: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

Ideas of how to implement volumeIdeas of how to implement volume snapshot feature

Use dm-snap. It is the easiest way but works slow.Use dm snap. It is the easiest way but works slow.

Implement a completely new implementation like Parallax does but it will take long timedoes but it will take long time.

Use OCFS2, which has rich features but it will be a bit heavyheavy.

CONFIDENTIALXen Summit AMD 2010

Page 19: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

An idea of using OCFS2

If you place only one VM's volume placed in an OCFS2 onIf you place only one VM s volume placed in an OCFS2 on a logical volume on a head-server, you can obtain:

Better snapshot mechanism using an ocfs2's new feature reflink.p g

Thin provisioning.

The volume can still be moved to another server.

CONFIDENTIALXen Summit AMD 2010

Page 20: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

Place only one guest's volume in an ocfs2 filesystem

Server A Server B

VM1 VM2 VM3VM2migrate

Server A Server B

OCFS2 OCFS2 OCFS2

snap

VM1'svolume

VM2'svolume

snap

snap

VM3'svolume

snap

migrate

snap snapsnap

logical volume

logical volume

logical volume

can be extended crea

logical volume

ate

Xen Summit AMD 2010 CONFIDENTIALphysical storage pool

Page 21: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

ToDo (2)

Shared storage for VMs, which some type of active/activeShared storage for VMs, which some type of active/active clustering software requires. The point is the way of rebuilding the mirrored devices.g

The way to determine which server should take the job to rebuild the mirrored device.

Make the rebuilding job and write access to the device exclusively.

Fast VM deploying and cloning. This can be done with the combination of “shared storage” and “snapshot” features.

CONFIDENTIALXen Summit AMD 2010

Page 22: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

Fast VM deploymentp y

VM VM

logical volume logical volumelogical volume logical volume

cow cowsnapshotsnapshot

ready only ready onlyy y y y

e.g. CentOS installed

Xen Summit AMD 2010 CONFIDENTIALstorage pool

g

Page 23: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

ToDo (3)

Make one server be able to manage both VMs and a lot of h i l di kphysical disk.

Do you really want this feature?Improve the disk chunk allocation algorithmImprove the disk chunk allocation algorithm.

Make it disk performance aware. Graceful server termination.

The copies of the chunks in the server should be prepared before the termination.

Make VastSky Manager be able to run in a VMMake VastSky Manager be able to run in a VM.Need some trick. The info to create the volume of the VM for VastSky is stored in this volume.

CONFIDENTIALXen Summit AMD 2010

Page 24: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

Roadmap

First version releaseXCP integration

Make it stable

Performance test

Write documents

The target date is this coming June.

Second version and afterThe rest on the Todo list. What should we do first?

CONFIDENTIALXen Summit AMD 2010

Page 25: VastSky - Cluster Storage System for XCP · 2018-03-22 · Cluster storage system for XCPCluster storage system for XCP Apr. 28th, 2010 VA Linux Syypstems Japan K.K. Hirokazu Takahashi

Thank you!

Xen Summit AMD 2010 CONFIDENTIAL