STO7535 Virtual SAN Proof of Concept - VMworld 2016

44
Conducting a Successful Virtual SAN 6.2 Proof of Concept Paudie ORiordan, VMware, Inc Cormac Hogan, VMware, Inc STO7535 #STO7535

Transcript of STO7535 Virtual SAN Proof of Concept - VMworld 2016

Page 1: STO7535 Virtual SAN Proof of Concept - VMworld 2016

Conducting a Successful Virtual SAN 6.2Proof of ConceptPaudie ORiordan, VMware, IncCormac Hogan, VMware, Inc

STO7535

#STO7535

Page 2: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL 2

• This presentation may contain product features that are currently under development.

• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.

• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

• Technical feasibility and market demand will affect final delivery.

• Pricing and packaging for any new technologies or features discussed or presented have not been determined.

Disclaimer

Page 3: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL 3

Page 4: STO7535 Virtual SAN Proof of Concept - VMworld 2016

Agenda

1 Introduction to Session

2 Introduction to Virtual SAN

3 Tools to conduct a successful Virtual SAN proof of concept (POC)

4 POC validation scenarios

5 Data Services Considerations

6 Measuring Performance

CONFIDENTIAL 4

Page 5: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL

This session…• Virtual SAN has been available since March 2014, almost 2.5 years

• To date, we have now almost 5,000 VSAN customers.

• VMware recognises that conducting a Virtual SAN proof of concept can be challenging

• Since the launch of Virtual SAN, additional tools for managing, monitoring and troubleshooting Virtual SAN have become available

• In this session, the tools available to vSphere and Virtual SAN administrators will be discussed, and how they can help deliver a Virtual SAN proof of concept

5

Page 6: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL 6

Introduction to VMware Virtual SAN• Storage scale out architecture built into

the hypervisor

• Aggregates locally attached storage from each ESXi host in a cluster

• Dynamic capacity and performance scalability

• Flash optimized storage solution – Fully integrated with vSphere and interoperable:

• vMotion, DRS, HA, VDP, VR …

• VM-centric data operations

• Many new data services

+ + + ++ + +

+

DatastoreVirtual SAN

Page 7: STO7535 Virtual SAN Proof of Concept - VMworld 2016

What I Need to Be SuccessfulTools to conduct a successful Virtual SAN POC

Page 8: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL 8

Before YOU BEGIN: Verify Your Components Against HCL• VMware Virtual SAN Hardware

• Server, Controller, SSD, Disk on HCL• Controller Firmware, Driver• Disk Firmware, • Enclosure Firmware

• SAS/SATA SSD Minimum Firmware is Critical– Rule is minimum or higher

• NVMe Firmware – HCL lists absolute version only

Page 9: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL

Success Tool #1 : Health Plugin – Reactive Health Checks• Introduced with Virtual SAN 6.0

• Incorporate in the vSphere Web Client

• Virtual SAN Health Check tool include:– General Health– Proactive tests– Virtual SAN HCL health– Physical disk health

9

• Especially useful when injecting errors into cluster and verifying that they have been remediated

Page 10: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL

Success Tool #1 : Health Plugin – Proactive Health Checks• Proactive tools running

on Virtual SAN cluster and pre-production tests– VM Creation test– Storage Performance– Multicast performance

test

10

Page 11: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL 11

Success Tool #2 : Capacity Views• Dedupe and

Compression Savings

• Group by Object Type– Filesystem overhead– Dedupe overhead– Checksum overhead– Virtual disks– Swap– Home namespace

Page 12: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL 12

Success Tool #3 : Performance Service• Enable it once

• Integrated with vSphere

• Simplified metrics– Backend (VSAN)– Frontend (VM)

• Distributed Architecture– No SPOF

• Historical data

• Status monitored by health checks

Page 13: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL

Success Tool #4 : HCIbench• Hyperconverged Infrastructure benchmark

• Based on Vdbench

• Designed to work on distributed architectures like Virtual SAN

• UI Driven

• Free

• Provides results in both text format, and format that can be viewed in VSAN Observer

• Now available from https://labs.vmware.com/flings

13

Page 14: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL 14

Success Tool #5 : RVC/Virtual SAN Observer• Native tools installed on Linux/Appliance and Windows versions of vCenter Server• Used for Configuration and Status of the Virtual SAN Cluster• For Performance and Activity monitoring on demand

– VM level– Host level– VMDK level– HDD/SSD Level

• Any anomalies will show up with the metric in question shown in red

• Follow the I/O : VM -> VMDK -> Disk Group -> Disk -> Congestion

Page 15: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL

Success Tool #5 : RVC/Virtual SAN Observer (ctd.)

15

vsan.apply_license_to_cluster

vsan.enable_vsan_on_cluster

vsan.disable_vsan_on_cluster

vsan.clear_disks_cache

vsan.cluster_change_autoclaim

vsan.cluster_set_default_policy

vsan.enter_maintenance_mode

vsan.fix_renamed_vms

vsan.object_reconfigure

vsan.host_wipe_vsan_disks

vsan.recover_spbm

vsan.reapply_vsan_vmknic_config

Cluster

vsan.check_limits

vsan.check_state

vsan.cluster_info

vsan.cmmds_find

vsan.whatif_host_failures

vsan.resync_dashboard

Disk

vsan.disk_object_info

vsan.disks_info

vsan.disks_stats

Host

vsan.host_info

vsan.host_consume_disks

Networking

vsan.lldpnetmap

VM

vsan.vm_object_info

vsan.vm_perf_stats

vsan.vmdk_stats

vsan.obj_status_report

vsan.object_info

Troubleshooting

vsan.support_information

vsan.observer

Virtual SAN Operation Virtual SAN Information

Virtual SAN Monitoring

Page 16: STO7535 Virtual SAN Proof of Concept - VMworld 2016

Validation ScenariosExpected outcomes from POC activities

Page 17: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL 17

PoC Validation• What are the most important test validation?

1. Successful VSAN configuration2. Successful VM deployments on VSAN datastore3. VM Availability in the event of failures (host, storage device, network)4. VSAN Serviceability (maintenance of hosts, disk groups, disks)5. VM Performance meets expectations6. VSAN Data Services (Dedupe, Compression, RAID-5/6, Checksum) working

as expected

Page 18: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL

Case #1 – Successfully VSAN Deployment – Checklist• Correct vSphere versions

• Appropriate licenses – especially if PoC is expected to take a long time (> 60 days)

• Correctly Configured Network– VSAN requires multicast, so prep the network team

• Minimum of three servers– Or 2 servers plus a witness appliance if doing Remote Office/Branch Office (ROBO)

18

Remember, the VSAN Health Check will do most of this work for you.

Page 19: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL

Case #1 – Successfully VSAN Deployment – Checklist (ctd.)• Minimum of three servers contributing

storage:• At least one storage controller – you’ve checked

the HCL, and drivers and firmware are valid, right?• At least one flash device (SSD, PCIe) for cache –

check the HCL• At least one magnetic disk (hybrid) or flash device

(all-flash) for capacity – check the HCL

• Or consider VSAN Ready Nodes as an option …

15

Remember, the VSAN Health Check will do most of this work for you.

Page 20: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL 20

Case #1 – Successfully VSAN Deployment – Device Claiming• Devices not visible

– Some RAID controllers won’t present individual disks without RAID configuration– May need RAID-0 configuration set on storage devices via controller

• Devices not being claimed– Some controllers allow devices to be shared; so devices get presented as “non-local”– VSAN will only claim devices that are local

• SSD showing up as HDD– Placing devices in RAID-0 will do this

• All-Flash using wrong devices for cache/capacity– Set VSAN to “Manual mode” when setting up all-flash– Gives control over which devices are used for cache and which devices are use for

capacity

Page 21: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL

Case #1 – Successfully VSAN Deployment – Overall health

21

Run health checks after every test!

Clear Alarms!

Use it to verify a problem that was

previously introduced is now fixed!

Check the Virtual SAN Health Check regularly

Page 22: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL

Case #2 : Successful VM Deployment on VSAN

22

Use the Health Check – Proactive Tests to do initial VM deployment check

Part of the Proactive Tests. This will verify if virtual machines can

be created on VSAN cluster

Page 23: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL

Case #2 : Successful VM Deployment on VSAN

23

Component host location

I created a new VM, but where/how is the VM is stored

Page 24: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL

Case #3 : VM Availability in the Event of Failures• Various failures may be introduced as part of a typical POC

– Host failure– Flash device / Magnetic Disk failure – Cache/Capacity device failures– Network failure

• Objective: ensure that the VM continues to be available in the event of a failure. VM maybe restarted on another node in the cluster.

• vSphere HA is fully integrated with Virtual SAN so that virtual machines on the failed host are restarted on other hosts elsewhere on the cluster

24

Page 25: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL

Case #3.1 : Host Failures• How many hosts do I really need?• A minimum of 3 hosts is needed to support VSAN.

• What about rebuilding after a failure or maintenance mode operations?

• If you want virtual machines to remain highly available on VSAN during these scenarios, consider configuring for additional capacity i.e. minimum 4 nodes.

25

Page 26: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL

Case #3.2 : Storage Failures

• The Virtual SAN 6.0 Proof Of Concept Guide has details on how to inject temporary disk errors for the purpose of testing.– A real disk failure results in immediate rebuild activity initiated by VSAN

26

Eject/Offline/Unplug: AbsentWait 60 minutes before

remediation

Failure: DegradedImmediate remediation

Page 27: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL 27

Case #3.2 : Storage Failures (ctd.)• Additional considerations when dedupe/compression are enabled on VSAN

– Deduplication and compression hash tables/metadata are spread across all disks in a disk group– A single device failure in the disk group will render the whole of the disk group unavailable– All data in disk group will be rebuilt elsewhere in the cluster (if resources allow)

Rebuild Rebuild Rebuild

Page 28: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL

Case #3.3 : Network Failure

28

Part of the Proactive Tests. This will verify if multicast performance

is acceptable can for VSAN cluster

Multicast configuration is the most common issue

Start simple

If you want feature like LACP, don’t implement

initially. Turn off QoS/Flow Control, then build it

afterwards

Page 29: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL

Case #3.4 : Validating Rebuild Activity After Failure• Virtual SAN might need to move data around in the background: change policy, host failure,

long term/permanent component loss, user triggered reconfig, maintenance mode, etc.

• UI Resync Dashboard shows the VMs that are resyncing and remaining bytes to sync

29

Remember! Test one thing at a

time!

Page 30: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL

Case #4 : VSAN Serviceability – Maintenance Mode

30

I want to update one of my ESXi host in a VSAN cluster, what do I do ?

VSAN provides multiple options for maintenance mode

Page 31: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL

Case #4 : VSAN Serviceability – Maintenance Mode

31

Ensure Accessibility Full Data Migration No data MigrationLost of VM compliance Full VM Data compliance No VM availability ensured

Short time maintenance More than one hour of Maintenance

Short time maintenance

Short Storage preparation Long storage preparation No Impact

Limited Free Storage space required

Free Storage space requirements on the other nodes

No Impact

Full migration

unvailable in 3 node

clusters!

Page 32: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL

Case #5 : Management – Disks Serviceability

32

Disk serviceability feature enables identification of to be replaced magnetic disks and flash based

Page 33: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL 33

Case #5 : Management – Disk/Disk Group Evacuation• Allows you to evacuate data from disk groups and individual disks before removing

a disk/disk group from a Virtual SAN host

• Allows Virtual SAN to ensure all workloads stay fully compliant with their policy!– Supported in the UI, ESXCLI and RVC.– Check box in the “Remove disk/disk group” UI screen.

Page 34: STO7535 Virtual SAN Proof of Concept - VMworld 2016

PoC considerations for New Data Services in VSAN 6.2

Page 35: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL

New Data Services in VSAN 6.2• Erasure Coding – RAID-5/RAID-6 Support

• Deduplication / Compression

• Checksum

• IOPS limits / QoS

35

There are performance considerations associated with all of the above.There are also some issues to be aware of!

Page 36: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL 36

Capacity Overhead of the New Data Services• Overheads are all calculated in advance

– Deduplication/Compression maintain hash tables• Approx. 5% overhead

– Checksum Metadata is stored separately from data • Approx. 1.2 % overhead

Many customers are surprised by the amount of overhead when data services are first enabled

Page 37: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL 37

Data Services File System Overheads – Don’t Panic

• Deduplication and Compression File System Overhead is 5% (approx.) of Total Virtual SAN Capacity

• Checksum Overhead is approx. 1.2% of capacity

Page 38: STO7535 Virtual SAN Proof of Concept - VMworld 2016

How to Measure Virtual SAN Performance?

Page 39: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL 39

How to Test Performance…• Distributed architecture => best performance when the pooled compute and storage resources

in the cluster are well utilized.

• This usually means a number of VMs each running the specified workload should be distributed in the cluster and run in a consistent manner to deliver aggregated performance.

• This part of an evaluation can be complex and time-consuming

• Real application workloads are best, but …– synthetic workloads (IOmeter) might be easier to set up– simplistic workloads don’t really reflect what Virtual SAN can do

• Worth a read: Pro Tips For Storage Performance Testing– http://blogs.vmware.com/storage/2015/08/12/tips-storage-performance-testing/

Page 40: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL

Performance Testing Considerations (Primarily for Hybrid)

40

Is the test utilising the distributed storage resources of Virtual SAN? • Multiple VMs across multiple hosts delivers better performance than one VM on one host.

Is the working set fully in cache, utilising flash performance?• Read-cache misses will incur latency. Is the workload cache friendly?• Sustained sequential write workloads fill cache, which must then be destaged. Mixed

R/W workloads with repeat patterns are best.

Is the cache warmed if using VSAN hybrid?• Initial results from starts of tests will not be reflective of overall performance.

Warning : Make sure dedupe scrubber is disabled. Causes performance issue on hybrid *

* KB 2146267

Page 41: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL 41

Performance Test with HCIbench/vdbench• VMs will be distributed equally across all hosts• Select I/O size• Select R/W ratio• Select random/sequential• Select duration of test• Disks can be zeroed with “dd”*• VMs will be removed (optionally) when test

completes• Produces results per VM

– IOPS, Latency, Throughput, etc• Produces results consumable by VSAN

Observer

* Avoid zeroing disks if deduplication enabled – will create hot-spot

Page 42: STO7535 Virtual SAN Proof of Concept - VMworld 2016

CONFIDENTIAL 42

Q & A

Page 43: STO7535 Virtual SAN Proof of Concept - VMworld 2016
Page 44: STO7535 Virtual SAN Proof of Concept - VMworld 2016