Software-defined storage: what can it do for you · 2020-05-01 · storage using free,...

38
Mikhail Gloukhovtsev Senior Cloud Solutions Architect Orange Business Services SOFTWARE-DEFINED STORAGE: WHAT CAN IT DO FOR YOU?

Transcript of Software-defined storage: what can it do for you · 2020-05-01 · storage using free,...

Page 1: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

Mikhail GloukhovtsevSenior Cloud Solutions ArchitectOrange Business Services

SOFTWARE-DEFINED STORAGE: WHAT CAN IT DO FOR YOU?

Page 2: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 2

Table of Contents

1. Introduction ......................................................................................................................... 3

2. Definition of Software-defined Storage ................................................................................ 5

3. Why Has the SDS Concept Emerged? ................................................................................ 7

4. The Main Capabilities of SDS and What Value They Provide to Customers .......................10

5. Does Every Company Need SDS?.....................................................................................14

6. Co-existence of Traditional Storage Platforms and SDS ....................................................14

7. SDS Relies on Hardware Innovations ................................................................................15

8. Hardware Support Is a Must-Have Even for SDS ...............................................................16

9. SDS Vendors, Products, and Solutions ..............................................................................16

9.1 What Features to Consider While Selecting an SDS Vendor and Product .......................17

9.2 A Variety of SDS Platforms ..............................................................................................18

9.3 EMC ViPR .......................................................................................................................18

9.4 EMC ECS Appliance .......................................................................................................23

9.5 EMC ScaleIO ...................................................................................................................24

9.5 EVP: Federation Software-defined Data Center...............................................................27

9.6 VMware VSAN.................................................................................................................30

9.7 VMware ECO:Rail ...........................................................................................................33

10. Conclusion .....................................................................................................................35

11. References .....................................................................................................................36

Disclaimer: The views, processes, or methodologies published in this article are those of the

author. They do not necessarily reflect the views, processes or methodologies of EMC

Corporation or Orange Business Services (my employer).

Page 3: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 3

1. Introduction

When I first heard about software-defined storage (SDS) at a technical conference two years

ago, I got confused - software has always defined a properly designed infrastructure, has it not?

For example, redundant array of independent disk (RAID) sets, well known for more than 30

years, can be seen as software-defined storage. Is “software-defined storage” part of what is

covered by a marketing phrase – “Software-defined Everything”1 – that has been called the

"next big thing"? How can “software-defined storage” be defined as an IT term to prevent

misusing it in a chain of “software-defined X” constructs like “software-defined radio”2?

Furthermore, for many years some storage vendors claimed the advantage of their “hardware-

based performance” products and it appeared to make sense. While I understood that new

software releases come much more frequently than new application-specific integrated circuit

(ASIC) types, should it be seen as the main benefit of software-defined storage?

Christos Karamanolis (VMware, Office of the CTO) wrote that 2012 was the year of “software-

defined data center.”3 It was the year when the term “software-defined data center (SDDC)” was

coined by VMware’s former chief technology officer (CTO), Dr. Steve Herrod. The following

years have shown that we witness the emergence of the new concept of data center defined by

various terms – software-defined data center (SDDC, VMware),3,4 application-centric

infrastructure (ACI, Cisco),5 software-defined environment (SDE, IBM),6 software-defined

infrastructure (SDI, Intel),7 federated software-defined data center (EMC, VMware, Pivotal –

EVP).8 SDS can follow the development and acceptance of “software-defined networking”

(SDN) that has gained popularity as a component of the SDDC. We see the SDS buzzword in

many online and print trade magazines and hear it at almost every technical briefing on the next

generation data center.

Is SDS really the next big thing in storage technology? Or is it just hype generated by the

marketing machine as Rich Castanga, the editor of Storage Magazine,9 and other critics of SDS

point out? Valdis Filks, research director for Storage Technologies and Strategies at Gartner

Research, ironically discussed whether SDS is, in fact, the re-labeled storage resource

management, a kind of SRM 2.0 – a creature of “surreally defined marketing.”10

If SDDC as an umbrella term for all the derivatives mentioned above (ACI, SDE, SDI, and EVP)

is a semantic construct, SDS as an SDDC component is considered by SDS critics as “a

synonym for private storage clouds, which is a synonym for Storage as a Service, which is a

synonym for managed storage.”10 Critics of the view of SDS as a complete replacement of

Page 4: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 4

hardware-centric storage remind that SDS relies on continuing progress in storage hardware

development.9 Indeed, innovations in hardware technologies such as new powerful processors

from Intel and flash storage are enablers for SDS.

This article is my attempt to find answers to the questions of what SDS is and how we can

separate marketing myths and reality. I review the benefits of SDS, challenges in developing

this technology, how SDS is related to the broader concept of SDDC, and SDS use cases. How

will various non-IT companies include SDS in the storage services roadmaps they develop to

meet business requirements? As large investments have been made in the existing storage

environments, it is important to understand whether traditional storage platforms can coexist

with SDS. Or will legacy storage be converted into SDS?

To answer these questions, we need to understand first how SDS is defined.

Page 5: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 5

2. Definition of Software-defined Storage

Storage vendors offer various definitions of software-defined storage (SDS) but there is no

generally accepted definition at this time. The common element in all the definitions is

"hardware independence," as hardware-agnostic storage solutions allow users to deploy

storage on hardware they choose, including commodity hardware, and thereby avoid vendor

lock-in for their future storage purchases. In the “Software-defined Storage” Working Draft,11 the

Storage Networking Industry Association (SNIA) proposes an SDS definition via attributes and

functionality rather than giving a brief definition. In my opinion, this definition has been chosen

with the goal of making such a definition applicable to the broad trends in SDS development.

Actually the SNIA Working Draft considers SDS implementation model as the main differentiator

of SDS: “Data Services can be executed either in servers, storage, or both spanning the

historical boundaries of where they execute.”11 They can run on any storage device and support

many different data types and access protocols.

Gartner defines SDS as “an architectural vision that includes the principles of orchestration,

instrumentation and automation. [It] can be fully realized only by a standards-driven integration

of heterogeneous storage hardware and software platforms.”12

Other definitions of SDS focus on separation of data and control planes: “SDS layers a control

plane for applications and policy on top of a data plane, which essentially manages information

across various forms of infrastructure from on premise to the cloud.”13 Or they underscore the

hardware-agnostic feature of SDS: “SDS is any storage software stack that can be installed on

any commodity resources (x86 hardware, hypervisors, or cloud) and/or off-the-shelf computing

hardware.”14 The control plane becomes a centralized storage resource management (SRM)

service capable of managing pools of heterogeneous resources across the entire data center.

VMware’s definition of SDS is VM-centric: “Software-defined Storage (SDS) is the vision that

storage services are dynamically created and delivered per VM and controlled by policy.”15 The

VMware vision of SDS assumes the transition of storage services from hardware-centric arrays

to VM-centric environment. This will lead to alignment of the storage services with application

requirements.

Page 6: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 6

The SNIA concept of SDS emphasizes platform independence of SDS, allowing customers to

use commodity hardware. At the same time, it considers the possibility of integration of

traditional storage and SDS when SDS may be an addition to the existing storage platform by

providing new features or an enhancement to the existing functions of specialized hardware.

Figure 1: The SNIA Vision of Software-defined Storage (Ref. 11)

According to the SNIA Working Draft,11 other attributes of SDS are scale-out capability, use of

storage resource pools, ability for incremental growth, management automation, self-service

interface for users, and ability to set policy for managing the storage and data services (Figure

1).

Page 7: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 7

3. Why Has the SDS Concept Emerged?

The SDS concept was put forward in 2012 for a few reasons. Historically, the idea of providing

storage using free, non-proprietary storage software running on commodity hardware is not

new. For example, Ceph is a free, open-source software storage platform developed in 2007

and designed to run on commodity hardware to present object, block, and file storage from a

single distributed computer cluster scalable to the exabyte level.16 GlusterFS, an open-source

software-based network-attached filesystem that deploys on commodity hardware and is

capable of scaling to several petabytes handling thousands of clients17 is another example of

what can be called SDS today.

The term SDS for this type of storage products has become popular for describing a general

category of hardware-agnostic storage platforms as their development gains momentum and

new business needs emerge. These new business requirements resulted from storage service

pain points – fast data growth with the advent of Big Data, inability to meet quick changes in

business processes and related application workloads, growing storage TCO, and security

challenges – the list is not short.

Next-generation applications such as Big Data Analytics and cloud-based applications led to the

concept of application-aware storage in the application-centric data center. The traditional

storage platforms are not designed for cloud and Big Data applications, having a fundamentally

different architecture.18,19 These new applications, which require hyper-scalability, use standard

communication protocols to facilitate universal access and interoperability, mix both structured

and unstructured content, process massive data sets, and must store both object- and block-

access content. They need a storage platform that can support many different data types and

access protocols independently of the hardware. This has been characterized as emerging of

the Third Platform20 driven by cloud computing, Big Data, social media, and mobile computing.

The Third Platform includes the SDS concept as a component of a more broad vision of the

Software-defined Data Center (SDDC).

Page 8: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 8

Figure 2: Business Drivers for Software-defined Storage (Ref.21)

With all these new business requirements (Figure 2), the purpose of storage virtualization

becomes not consolidation, as it was in the recent past, but rather storage agility requiring

storage service provisioning on demand. As the application landscape is moving from “many

applications on one server” to “one application on many servers” deployment, the role of scale-

out storage technologies is gaining momentum.

There has been a confluence of new business needs for storage services with the post-

recession mind-set of consumers no longer accepting the escalating cost of storage and looking

for “doing more with less.” Consumers have understood that simply adding more capacity is

unsustainable - they cannot continually purchase more storage as their primary storage reaches

the capacity limit. It has become clear that the traditional storage services based on static, slow-

to-respond, hardware-centric storage infrastructures cannot scale economically. Growth of the

data residing on proprietary storage hardware often leads to premature rip-and-replace

upgrades in order to meet new performance and capacity requirements. Companies start to

compare their cost of traditional storage service ($ per GB) with the cost of storage services

offered by public cloud providers with very low $ per GB ratio. The end-users begin to demand

that their companies deliver storage service with more cloud-like features. If their company

cannot provide it, they have alternatives by using cloud provider services. This has led to end-

Page 9: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 9

users’ investment into “shadow IT.” Thus, the fundamental shift in customer expectations for

storage services is one of the factors which have brought SDS into existence.

Emerging hyperscale computing environments that work with multi-petabytes of storage and

tens of thousands of servers change the storage economics. Let us look at numbers - at

traditional, terabyte scale, just one cent reduction in the storage service cost ($0.01/GB) results

in savings of up to $1,000 per month for 100 TB of storage. However, the same cost reduction

($0.01/GB) for the petabyte-scale storage (for example, 100 PB), results in cost savings of

$1,000,000 per month. This would be something to tell the CFO about. Using SDS on

commodity hardware platforms allows consumers to bring hyperscale economics to their data

centers, making them software-defined data centers.

Hyperscale and Big Data applications have become the new normal, and wish lists of many

storage customers now include cloud-type scalability to thousands of nodes and to support

multi-petabytes of data. Storage services should be able to manage Big Data Analytics, be

“application workload smart”, and provide flexibility, intelligence, security, and cost-

effectiveness. As proprietary hardware solutions stumble to deliver these features in a cost-

effective manner, developing intelligent storage software that allows consumers to build

massively scalable unified storage with commodity hardware has come as a solution. This is a

conceptual shift from solutions developed for infrastructure based on reliable expensive

hardware with unreliable software to an architecture using reliable software running on

unreliable commodity hardware.

SDS can be seen as an evolution of legacy storage virtualization through creative destruction.18

Traditional block-based storage virtualization solutions are based on a virtualizer running on an

array-controller, appliance, or SAN-switch blade. SDS is more than just storage virtualization. In

the SDS realm, a storage virtualizer evolves into a storage hypervisor like the VM-centric

storage hypervisor developed by DataCore (SANsymphony-V). The storage hypervisor provides

a higher level of software intelligence capable of delivering storage services on commodity

hardware without relying on any ASIC-built storage functionality.

Page 10: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 10

4. The Main Capabilities of SDS and Value They Provide to Customers

The SNIA Working Draft on SDS11 lists the following capabilities SDS platforms should include:

Automation

Standard Interfaces

Virtualized data path

Scalability

Automation is of key importance, as storage users are moving to hyperscale and cloud-based

computing. Automated policies used for storage management functions, such as storage

provisioning, automated dynamic storage tiering, and information life cycle management (ILM),

simplify the management tasks that can be orchestrated by implementing a storage service

catalog. This improves service metrics such as storage amount per storage administrator

(TB/FTE) and, therefore, reduces the cost of providing storage services.

Standard interfaces are required for programmatic control of storage infrastructure via open and

standards-based APIs. The most common of them is Representation State Transfer (REST)

which is widely used in clouds and other web-based and network-based services. Conversely, a

traditional storage platform mainly uses its own proprietary APIs and management tools.

Virtualized data path is related to block, file, and object interfaces that support applications

using these interfaces. The unified SDS platform does not need to use different hardware

platforms for every data type and access protocol.

Scalability is critical, as users want to be able to scale the storage infrastructure in a cost-

effective manner without disruption to availability or performance.

Other features include federation capability, which allows customers to create a large-scale

solution that aggregates disparate storage sources into a single pool with data mobility within

the pool. This significantly improves storage resource utilization and makes it possible to avoid

creating separate storage silos. SDS should enable mixing and matching hardware types if

needed.

Page 11: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 11

Automated policies used to maintain service levels and match requirements with capabilities

allow storage administrators to focus on higher level tasks rather than spending time fixing run

and maintain type problems.

The main benefit for customers from the cost perspective is the fact that SDS decouples the

software from the hardware life cycle. For a hardware-defined platform, typically a mix and

match of multiple system generations is not allowed and hardware upgrades are intricately tied

to the development cycle of the software that is supported on that platform. By decoupling the

software from the hardware life cycle, customers can extend the value of their investment by

using a mix of hardware instances and generations. They can run and federate newer software

on multiple older hardware instances as necessary for greater investment protection.

Another benefit is that SDS provides the customer with flexibility in procuring the storage

platform via multiple delivery models: as software and hardware instances (appliances), as

software only (which means the customer chooses the hardware instance), or as virtual

machines, etc. (see Section 9). As a result, customers are able to leverage existing investments

and gain cost savings. SDS allows customers to homogenize the hardware-vendor

heterogeneous data center at the software layer.22

Table 1 provides a comparison between traditional hardware-centric storage and SDS and

shows how companies can benefit from implementing SDS.

Page 12: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 12

Features Traditional Hardware-centric

Storage

Software-defined Storage (SDS)

Reliability Mature technologies. Stable software.

Reliability is built in storage hardware,

software protects against rare cases

that are not handled by hardware

protection features.

SDS is developed under an assumption

that the hardware is not necessarily reliable

and the software is responsible to continue

providing storage services in case of

hardware failures. The alternative to high-

cost redundancy in a single device is to

scale out by adding additional servers and

storage in a distributed SDS environment.

Performance/q

uality of

service (QoS)

While implementation of storage pools

shared by applications with very

diverse I/O profiles makes delivery of

predictable performance a challenge,

a large install base and many years of

use in various industry verticals have

resulted in development of excellent

best practice guides and performance

tuning processes. Resource

partitioning (cache, ports, and drives)

and storage tiering are used for

providing QoS.

Linear scalable performance. By definition,

SDS platforms must incorporate a variety of

storage devices of different generations

with widely varying performance and

reliability characteristics, as they are found

in a typical data center. The SDS value is

in delivering a predictable quality of service

for the workloads running on

heterogeneous hardware. SDS can

automatically adapt to changing workloads

and take a proactive role in ensuring

ongoing application performance.

Scalability Scale-up architecture. The growth is

constrained within the frame.

Scaled-out architecture. Hundreds and

thousands of nodes if needed. HA and data

redundancy are managed by software.

Management/A

utomation

Proprietary CLI. Open and standards-based APIs including

HTTP/REST. Automation of storage

management tasks across all types of

storage.

Page 13: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 13

Interoperability Limited. Third-party storage support is

implemented through storage

virtualization (array controller, SAN,

appliance-based).

SDS systems use standard x86 hardware

and standard OSes, interoperability of

those has already been tested by their

vendors. SDS allows customers to

homogenize the hardware-vendor

heterogeneous data center at the software

layer.

Agility Limited by high acquisition CapEx and

procurement lead time.

Elasticity: add, move, and remove nodes

“on the fly.”

Total cost of

ownership

(TCO)

High because of scale-up design,

expensive development of specialized

hardware and software, and the

interoperability testing.

Low. Investment protection by being able to

leverage the existing infrastructure.

Decoupling the software from the hardware

life cycle enables customers to extend the

value of their investment by using a mix of

server instances and generations. They can

run and federate newer software on

multiple older hardware instances as

necessary, for greater investment

protection.

Table 1: Comparison of Traditional Storage and Software-defined Storage

Consumers reasonably expect that SDS will provide all the main storage features such as no

single point of failure, automated tiering, thin provisioning, deduplication, cloning, snapshots,

and data replication as is provided by traditional hardware-centric platforms by using ASICs and

field-programmable gate arrays (FPGAs) for some of these tasks. Since SDS will often be

deployed at multi-petabyte scale, availability must be extremely high and require no intervention

on known failure scenarios.

Page 14: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 14

5. Does Every Company Need SDS?

SDS vendors like to present their solutions as “the same type of technology Google and

Facebook use.” However, as mentioned by Rich Castanga,9 how many companies even come

close to “Web-scale” (hyperscale) with millions of servers as the marketers like to say? While

some cloud-type applications may greatly benefit from using SDS as discussed in the sections

above (see also Table 1), some business-critical applications may run perfectly well on

traditional storage platforms. While hyperscale computing works well for web companies whose

environments typically consist of a small number of very large applications, in contrast, typical

enterprise IT environments include a larger number of specialized applications. SDS may be not

the best choice for some of them. It is important to understand this in the context of discussions

about application-centric infrastructure – we should be able to choose a storage platform based

on the application storage workload.

The same storage vendors who offer SDS products can strongly recommend using specialized

hardware-based solutions for some applications. For example, while IBM is a major player in the

SDS market,23 the IBM strategy for Big Data Analytics is based on offering the IBM PureData

System for Analytics appliance – formerly known as the IBM Netezza Data Warehouse

Appliance. Netezza minimizes data movement by using hardware acceleration such as FPGAs

to filter out extraneous data as early in the data stream as possible.24

The need to leverage existing storage infrastructure brings two considerations. First, as

traditional storage platforms and SDS should co-exist, we need a storage resource

management system capable of controlling both of them in a simple and effective way.

Secondly, how difficult is it to move an application from a hardware-centric storage platform to

SDS? Is re-architecting of the application required? Can such a migration be done non-

disruptively while keeping the application online?

6. Co-existence of Traditional Storage Platforms and SDS

Many users are looking at options for leveraging existing traditional storage infrastructure with

SDS solutions that consolidate heterogeneous storage into resource pools. Such leveraging

helps in achieving agility in storage services. Indeed, considering the business dynamics that

lead to changes in storage service requirements, storage growth planning cannot be accurate

even for traditional environments. It is even more true for long-term storage growth procurement

planning for cloud computing and Big Data because of the requirement for rapid elasticity of

storage resources. This requirement can be met in a two-way manner – by adding new

Page 15: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 15

functionality to existing, mainly scale-up storage technologies and by developing new scale-out

cloud-oriented or Big Data-optimized technologies. As the application landscape is moving from

“many applications on one server” to “one application on many servers” deployment, the role of

scale-out SDS technologies is gaining momentum. An important feature of the transformation of

how storage is provisioned and consumed is that new SDS technologies should be seen not as

a replacement of the existing technologies and processes but rather as a complementary

approach.

Combining SDS with traditional storage infrastructure brings its own challenges. For example, if

a storage hardware vendor changes firmware or software, the SDS vendor must make relevant

changes in the SDS product and this multi-vendor driven upgrade process may require effort

from the customers to mitigate its impact on the storage services. New hardware technology

cannot be used until SDS vendors add it to their support lists.

As storage users plan to implement new SDS platforms, they want to be sure the SDS solutions

can be integrated with the current storage environment without compromising the quality and

reliability of the storage services. To meet such user demand for integrating new SDS solutions

with the existing storage platforms, traditional storage vendors such as EMC, NetApp, HDS, HP,

and IBM offer more SDS-like benefits, including increased flexibility, automation, and scalability

in their traditional storage solutions or provide opportunities to modernize the existing

environment by bringing it under SDS management as a way to adopt SDS (see Section 9).

In relation to migrations to SDS platforms, if a customer decides to move the existing storage to

SDS, SDS systems such as ViPR® (see Section 9.3) are ideal tools to use for automating and

orchestrating such a migration process.

7. SDS Relies on Hardware Innovations

SDS owes its advent and increasing popularity to recent hardware innovations. More powerful

multi-core CPUs are capable of running the full suite of storage services (provisioning, tiering,

snapshots, deduplication, data replication, encryption, etc.) and still provide CPU resources to

compute and network services. New CPU models that optimize and accelerate specific tasks

with new instruction sets are able to provide performance on par with specialized ASICs. For

example, SHA-1, the cryptographic hash function used by some deduplication mechanisms, is

now accessible directly via x86 instruction sets. Without advanced storage hardware, such as

flash technology, the SDS system might not be able to deliver acceptable performance. For

Page 16: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 16

example, VMware requires flash (SSD) to be in every Virtual SAN (VSAN) node, where it is

used for both read and write cache (see Section 9.6).

As Rich Castango emphasized,9 what is important is not that the storage hardware is getting

faster; more important is that it is also getting smarter using built-in intelligence. The concept of

software-defined data centers itself becomes feasible as it relies on development of intelligent

hardware. It is notable that the latest SNIA Storage Developer Conference (Santa Clara, 2014)

included a section for presentations on hardware development.25,26

8. Hardware Support Is a Must-Have Even for SDS

While SDS can use any industry-standard hardware, running it on hundreds of white-box nodes

using commodity disks with a higher failure rate would keep the IT team quite busy replacing

failed hardware parts and you would have to be sure you can get the required parts quickly.

Would you like to maintain your own spare part warehouse? All in all, hardware remains a

critical component even in SDS environments and you need to get end-to-end hardware support

for your enterprise infrastructure. That is why some SDS vendors sell software and hardware

bundles as appliances (see Section 9). Deploying SDS on the appropriate hardware

configurations for which compatibility has been tested and validated by the vendor and working

with a vendor capable of providing global services and support are critical to an enterprise-

quality SDS implementation.

9. SDS Vendors, Products, and Solutions

SDS vendors clearly fall into two categories: traditional storage vendors trying to introduce SDS

products while keeping traditional platforms and startups. In addition to EMC and VMware, there

are many other vendors, traditional and emerging, bringing SDS products to the market.

Traditional storage vendors (NetApp, IBM, HDS, HP, and others) add SDS solutions to their

product portfolios that have been represented mainly by storage arrays. It gives them an

advantage in providing an integration of new SDS solutions with the existing storage

infrastructure, as discussed in Section 6. They can offer plug-and-play type appliances by

combining SDS with certified hardware platforms. While vendors that do not sell their own

storage hardware accentuate the hardware-agnostic type of their SDS solutions, some of them

establish ecosystems with hardware vendors to offer SDS appliances. The partnerships of some

of these companies with Dell (VMware [ECO:RAIL], Nutanix [Dell XC-series converged

appliances], and Nexenta [NexentaStor]) are representative examples.

Page 17: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 17

OpenStack has played a special role in SDS development as it has helped customers

understand the benefits derived from server hardware vendor independence and they have

started to look for similar hardware independence in storage. The OpenStack Storage Object

service (Swift) and Block-Based storage service (Cinder) are well-known examples of SDS

services.18

SDS products from traditional storage vendors are sometimes met with suspicion – do they try

to save the market share? However, in fact leveraging/integrating the existing storage platforms

provided by traditional vendors offers less disruptive introduction of SDS and, as discussed

above, not all applications can benefit from SDS (Section 5). Furthermore, working with a well-

established vendor is important for enterprise customers. Therefore, the brief overview of SDS

products and solutions presented in this section focuses mainly on EMC SDS platforms as EMC

is a traditional storage vendor and at the same time is becoming a leader in SDS offerings. I will

also outline the VMware SDS platforms (VSAN and ECO:RAIL). Review of product lines offered

by other SDS vendors (NetApp, IBM, HDS, HP, SwiftStack, Gridstore, Scality, Nutanix,

Nexenta, DataCore Software, Zadara Storage, CloudByte, StarWind Software, SimpliVity,

Maxta, and others) is beyond the scope of this article and I suggest the readers find information

about other SDS products and solutions on the vendors’ web sites.

9.1 Features to Consider While Selecting an SDS Vendor and Product

Platform openness comes first. For an SDS supplier to be successful, it must support open

standards and interfaces (e.g. OpenStack). Second, a supplier should offer a full suite of data

services that are independent of underlying hardware capabilities, such as automated provision,

tiering, snapshots, thin provisioning, security, data deduplication, replication, and data migration.

Third, flexibility in future technology updates will provide investment protection. You want to be

able to change underlying hardware and hypervisor at a later time or to easily migrate to new

application-specific storage platforms if needed. Furthermore, the selected SDS solution must

match or exceed hardware-based platforms in performance. When you consider an emerging

vendor SDS product, ensure that the vendor provides end-to-end support for its SDS solution

and it meets your service level objectives. Table 1 in Section 4 may serve as a good reference

point for SDS feature review.

Page 18: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 18

9.2 A Variety of SDS Platforms

Today SDS buyers have various options. They can purchase an SDS appliance that bundles

SDS software with hardware or just SDS software and use their own hardware. In the first case,

SDS controls the OS and all the upper layers of the technology stack. This ensures better

stability and performance. In the second option, SDS software that can be downloaded after the

purchase runs as an application on top of the server OS (the server hardware and OS are

provided by the customer). Deployment flexibility is an obvious benefit of this option.

In a virtual storage appliance model, the SDS platform is delivered as a virtual machine to the

customer. Service providers can offer hosting the SDS platform in the cloud and this option can

be used for on-demand scalability.

9.3 EMC ViPR

ViPR27 as an SDS platform addresses a very difficult challenge faced by enterprise IT

departments today: how to achieve an efficient cloud-type operating model in a multi-vendor

storage environment while still using and maximizing existing storage system capabilities.

Customers can buy ViPR Services either as a complete storage appliance with bundled

hardware (EMC ECS Appliance) or as a software product that can be installed on customer-

provided operating systems and hardware as discussed in Section 9.2.

ViPR Architecture. ViPR architecture comprises two components; both of them in the software

layer:

1. ViPR Controller abstracts and consolidates heterogeneous, multi-vendor storage arrays

into a single virtual storage pool that can then be managed by policy. ViPR Controller

automates storage provisioning tasks and provides self-service access to storage.

2. ViPR Services developed entirely in software are data services providing cloud-type

capabilities to storage arrays, including EMC, commodity hardware, and third-party

storage arrays. ViPR services support multiple access methods – object, Hadoop

Distributed File System (HDFS), and block. ViPR Block is available via ScaleIO software

(see Section 9.5 below).

Page 19: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 19

Figure 3: ViPR Platform (Ref. 28)

ViPR 2.1, which was made generally available in September 2014, includes platform support for

EMC XtremIO®.27 This allows customers to bring XtremIO under ViPR management and deliver

Tier 0 storage resources via self-service on-demand.

The ViPR Controller uses software adapters for connecting to the underlying arrays via SMI-S

and public APIs exposed by the arrays. As consistent and simple REST API is exposed, any

vendor, partner, or customer can build new adapters so that new arrays can be managed by

ViPR.

Once ViPR is deployed in a data center, it discovers the physical infrastructure including storage

systems, Fibre Channel SANs, and hosts so that ViPR can understand the full topology of the

data center. To make the storage devices visible to the hosts, ViPR discovers the physical

storage pools and the storage ports for each storage system registered with ViPR.29

After discovering the physical infrastructure, ViPR administrators define the following ViPR

abstractions:

Virtual arrays

Virtual pools

End user services using virtual arrays and pools

Page 20: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 20

Virtual Arrays. The ViPR Controller manages storage through virtual arrays and pools

according to automated policies. A virtual array can span multiple physical arrays and

conversely, a physical array can be partitioned into multiple virtual arrays. As the block and file

data paths remain unchanged, data is still managed directly by the underlying block and file

arrays. This allows users to continue to take full advantage of the underlying intelligent storage

array technologies, without introducing I/O latencies.

A virtual array is defined by network connectivity and includes:

SAN switches/fabric managers within the networks

IP networks connecting the storage systems and hosts

Host and storage ports connected to the networks

ViPR virtual pools (a virtual array is associated with one or more virtual pools.)

Virtual Storage Pools. A ViPR virtual storage pool is defined using one or more underlying

physical storage pools. A virtual pool defines a set of storage capabilities that determine the

quality of storage, such as type of disks, thick/thin devices, and protection features such as

snapshots, replication, and high availability. The users can subscribe to virtual storage pools

that meet their unique requirements. The use of the virtual storage pools eliminates storage

silos and enables customers to maximize utilization of storage.

There are three types of virtual pools in ViPR: block, file, and data services.29 Data services

virtual pools are used to store object data and are backed by storage on underlying ViPR-

managed file arrays.

Virtual Data Center. A virtual data center is a logical construct that can map to a physical data

center or a part of one. A single virtual data center exists for each ViPR Controller. A virtual data

center that is the top-level resource in ViPR is typically partitioned into virtual arrays for

purposes of fault tolerance, network isolation, or tenant isolation. While geographical co-location

of storage systems in a virtual data center is not required, high bandwidth and low latency are

assumed in the virtual data center.

ViPR can be deployed in a multisite configuration with several ViPR Controllers managing

multiple data centers in different locations. In this type of configuration, ViPR Controllers form a

loosely coupled federation of autonomous virtual data centers.

Page 21: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 21

EMC Platforms Supported by ViPR. At the time of writing this article (January 2015), ViPR

supports the following EMC arrays/platforms: ECS Appliance, Isilon, RecoverPoint, ScaleIO,

Vblock, VMAX, VNX, VPLEX, and XtremIO.

Third-party Platforms Supported by ViPR. ViPR provides support for third-party arrays in two

ways: natively or via OpenStack Cinder:

ViPR provides native support for the following third-party arrays: NetApp FAS (7-mode), Hitachi

AMS 2100, USP-V, HUS VM, and VSP.

ViPR leverages Cinder to extend third-party array support (Dell EqualLogic, HP 3Par, Lefthand,

Huawei HVS, T/Dorado, IBM DS8000, SVC, XIV, Oracle ZFS, and SolidFire).

Starting with ViPR 2.0, the Southbound integration provisioning storage from ViPR via Cinder is

available (Figure 4). For a complete list of supported arrays and supported operations, please

see the ViPR Feature List at the EMC Online Support site.

Figure 4: ViPR Integration with OpenStack (Ref. 30)

Page 22: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 22

Integration with Cloud Stacks. ViPR is integrated with cloud stacks: VMware, OpenStack, and

Microsoft Hyper-V.30 The ViPR integration with OpenStack provides the following advantages:

ViPR’s third-party array support (multi-vendor) gets expanded.

When a new vendor is added to the list of Cinder plugins, ViPR gets on-the-fly

integration of the new third-party array. No other actions are required on the ViPR side to

provide new vendor array support.

ViPR Support for Geo-distributed Deployments. ViPR 2.0 supports the ability to federate

between ViPR Controller instances, either within a data center or between data centers. This

makes it possible to centralize the management of distributed storage environments by using

one interface with single sign-on.

ViPR Object geo-protection provides full protection against site failure due to a disaster or other

problem that causes a site to go offline. The geo-protection layer protects object data across

geo-distributed sites and ensures that applications seamlessly function in case of a site failure.

Component failures are recovered using fragments from the local site without WAN traffic.29

Multiple Access Methods. ViPR Services support industry-standard object storage APIs

(Amazon S3, OpenStack Swift, and EMC Atmos®) and are compatible with any Hadoop 2.x

distribution such Cloudera, Hortonworks, Pivotal, etc. ViPR/ECS Appliance enables

simultaneous access to data with multiple interfaces (see Section 9.4).

ViPR Installation Options. Customers can procure and install ViPR in three ways:

1. As a software product that can be installed and run on customer-provided traditional

EMC or third-party arrays, such as VMAX, VNX, or Isilon.

2. As a software product that can be installed on customer-provided operating systems

(OS) and commodity storage hardware.

3. As a complete storage appliance which includes ViPR Controller, ViPR Services, and

bundled commodity hardware - this is the ECS Appliance described in Section 9.4.

Use Cases. The ViPR Controller software can manage and automate a heterogeneous storage

environment with arrays. As the ViPR services software can be deployed on commodity

hardware, a customer can build their own Third Platform architecture. This enables the

Page 23: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 23

transition from the Second to Third Platform.20 Some of the ViPR benefits for delivering storage

services are presented in Table 2.

Feature Description

Simplicity Transforms multi-vendor storage into a single software platform

Automation Reduces time to provision storage by up to 85%

Agility Delivers policy-based on-demand storage services

Flexibility Seamlessly integrates with EMC, third-party and commodity platforms as well

as with Cloud Stack and supports industry standard APIs

Table 2: Use Cases for ViPR

9.4 EMC ECS Appliance

As mentioned in Section 9.3, customers have the option to deploy ViPR Services on their own

commodity infrastructure or to purchase ViPR Services on an integrated, scale-out commodity

appliance. While each of these options provides the economics of commercial off-the-shelf

software (COTS), some customers may prefer the ECS appliance as a fully integrated platform

with complete management and support from EMC (see Table 3).

Feature ViPR Commodity ECS Appliance

Package Software only. BYO Hardware Appliance (HW+SW)

Services Object, HDFS, Block Object, HDFS, Block

Network Infrastructure Customer provided EMC provided

OS Deployment and Patching Customer managed Managed by EMC

Support Joint support: customer and EMC Fully supported by EMC

Table 3: Comparison of ViPR Commodity and ECS Appliance

Page 24: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 24

Elastic Cloud Storage (ECS) Appliance (formerly known as “Project Nile”)31 is available in three

primary configurations supporting unstructured, block, and mixed use cases. Both the

unstructured and block engine deployments are designed to scale across racks as part of a

single logical server cluster.

The ECS Appliance main characteristics are:

Use COTS components

o Economies of scale

Density-optimized

o Up to 72 TB raw/rack unit

o Saves power/GB, real estate costs, etc.

Labor-optimized

o Manage the cluster, not the devices

o Maximize serviceability

Protection efficiency

o Geo-efficient storage

9.5 EMC ScaleIO

ECS Appliance delivers block storage via ScaleIO that is integrated in ViPR. Similar to VMware

VSAN (see Section 9.7 below), EMC ScaleIO is a SDS hyperscale converged server-SAN

solution for commodity platforms.32 It creates a virtual pool of block storage by leveraging HDDs,

SSDs, and PCIe flash cards from local servers.

Architecture. The ECS ScaleIO combines application servers' storage and compute resources

into a single architectural layer creating an on-demand low cost server SAN as an alternative to

existing SAN arrays. All I/Os and throughput are collective and accessible to any application

within the cluster. With EMC ScaleIO, storage is just another application running alongside other

applications, and each server is a building block for the global storage and compute cluster.

Page 25: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 25

Figure 5: EMC ScaleIO Architecture (Ref 33)

The ScaleIO components are ScaleIO Data Client (SDC), ScaleIO Data Server, and ScaleIO

Metadata Manager (MDM) (Figure 5). SDC is a block device driver that runs locally on any

application server and shares block volumes with applications. When the local application

issues an I/O request, SDC fulfills it regardless of the block’s location. SDCs maintain full data

location awareness to direct I/O requests to ScaleIO data server. The ScaleIO data server

provides local storage to the aggregate ScaleIO storage pool. All servers participate in servicing

I/O requests using massively parallel processing. MDM configures and monitors health of global

ScaleIO system.

By converging storage and compute, ScaleIO simplifies the architecture and reduces cost

without compromising on any benefit of external storage. The architecture allows for scaling out

from as little as three servers to thousands by simply adding nodes to the environment. ScaleIO

also has self-healing capabilities, which enables it to easily recover from server or disk failures.

Page 26: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 26

Features. ScaleIO features are listed in Table 4.

Feature Description

Scalability Hundreds and thousands of nodes

Performance Linear scalable

“Brown field”

support

Runs on existing infrastructure. Investment protection by reclaiming

unused capacity.

Elasticity Adding and removing nodes “on-the-fly,” fully automatic rebalance/rebuild

Fault tolerance 2 (3) copy “meshed” mirroring, protection domains, and advanced failure

handling

Table 4: ScaleIO Features

As storage and compute resource requirements change, ScaleIO can automatically tier data

and rebalance storage distribution to improve performance and capacity usage.

Performance. Every server in the ScaleIO cluster participates in processing I/O operations. The

architecture is parallel so, unlike dual-controller architecture, there are no bottlenecks or “choke

points.” The performance scales linearly and the cost/performance rates improve with growth.

Performance optimization is automatic; whenever rebuilds and rebalances are needed, they

occur in the background with minimal or no impact to running applications.

Scalability. EMC ScaleIO is designed to massively scale. Unlike most traditional storage

systems, as the number of servers grows, so do throughput and IOPS. Additional storage and

compute resources can be added modularly and as these resources grow together, the balance

between them is maintained.

Management. Managing a ScaleIO deployment is easy because a single layer has less

operational overhead. Administering the ScaleIO deployment does not require specialized

training and/or vendor certification.

Page 27: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 27

Installation Options. ScaleIO can be purchased as software only or with the EMC ECS

hyperscale storage appliance. ECS has five configurations ranging from 360 TB to nearly 3 PB.

As with traditional storage, ScaleIO software and the separately licensed features are priced per

terabyte.

Cost Benefits. Several cost benefits are provided by ScaleIO. First, as a software-only system,

it uses commodity hardware. And, secondly, because it creates a server-based SAN, no

dedicated storage components, i.e. fabric switches, cables, and HBAs are required. This results

in cost savings due to reduced power, cooling, and space. There are no “forklift” upgrades for

end-of-life hardware as failed disks or outdated servers can simply be removed from the cluster.

The ability for incremental capacity growth eliminates instances of unplanned costs in the

storage budget when more capacity is needed.

Use Cases. Similar to VSAN (see Section 9.6), the popular ScaleIO use cases include VDI, DR,

and testing and development.33

9.5 EVP: Federation Software-defined Data Center

In October 2014 EMC announced the first of five planned EMC Federation solutions - the new

EVP Federation software-defined data center (SDDC) solution integrating technology from the

Federation (EMC, VMware, RSA, and Pivotal).8 Four other solutions expected to be released

soon are:

1. Federation Platform-as-a-Service – EMC, Pivotal, and VMware offer Tier-1 application

support. Pivotal provides Pivotal ONE, which includes Pivotal CF, the Pivotal version of

Cloud Foundry (open-source PaaS layer).

2. Federation Virtualized Data Lake – The Federation Business Data Lake solution comes

primarily from Pivotal for data management and analysis (the Pivotal Big Data Suite:

Pivotal HD for Hadoop, and Pivotal Greenplum Database for structured data).

3. Federation End-User Computing – VMware provides the Horizon Workspace and EMC

provides Syncplicity Sync & Share technology that includes public cloud usability with

the option to use EMC storage systems in the data center: Isilon, Atmos, or VNX.

4. Federation Security Analytics – RSA delivers the security features to build a Security

Analytics platform on top of the Data Lake solution.

Page 28: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 28

The new Federation SDDC that takes advantage of the strong integration between EMC

technologies and the VMware vCloud Suite (Figure 6) is designed to be the base for all of these

future solutions.8 The Federated SDDC is positioned as a bridge to the so-called Third

Platform20 considered as the next generation compute platform that is accessed from mobile

devices, utilizes Big Data, and is cloud-based (see Section 3). While the EVP SDDC solution

delivers capabilities to expand from IaaS to business enabling IT as a Service (ITaaS), its key

value is in reducing the complexity of providing IT services.

The EVP SDDC solution includes the following products:

EMC ViPR Software-defined Storage Platform

EMC VNX and/or EMC Symmetrix® VMAX Storage Platforms

EMC Avamar and EMC Data Domain® Backup and Recovery Solutions

EMC and VMware Integrated Workflows

VMware vCloud Suite

VMware NSX Virtual Networking Technologies

VMware vCenter Log Insight

VMware IT Business Management Suite

Figure 6: EVP SDDC Key Components (Ref. 34)

Page 29: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 29

Feature Description

Automation and

self-service

provisioning

Using VMware vCloud Automation Center (vCAC) integrated with EMC

ViPR and VMware NSX to provide the compute, storage, network, and

security virtualization platforms

Multi-tenancy and

secure separation

Secure multi-tenancy through vCAC role-based access control (RBAC).

Workload-optimized

storage

Using EMC ViPR storage services and the capabilities of VNX and

VMAX, this solution provides policy-based management of block and file-

based virtual storage

Elasticity Using vCAC and tools provided by EMC, resources can be added

dynamically as needed, based on performance requirements

Fault tolerance 2 (3) copy “meshed” mirroring, protection domains, advanced failure

handling

Monitoring and

resource

management

These capabilities are based on a combination of VMware vRealize

Operations (vROps, former vCOps) dashboards, alerts, and analytics,

using extensive additional storage details provided by EMC analytics

adapters for ViPR, VNX, and VMAX

EMC and VMware

integration

The EVP SDDC solution contains many integration points between EMC

and VMware products (Figure 7)

Table 5: EVP SDDC Features

Page 30: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 30

Figure 7: EMC ViPR Integration Points with VMware (Ref.34)

9.6 VMware VSAN

VMware first announced a VSAN beta version at VMworld 2013 and made the solution

generally available in March 2014.35 The goals of introducing VSAN are both to lower overall

storage costs and to eliminate the I/O latencies associated with networked storage. The

assumption is that customers using VMware software for server and network virtualization in a

VMware-centric IT environment may want to use VMware software for storage virtualization as

well. VMware’s VSAN represents a hypervisor-converged storage solution using software to

convert commodity server-based storage into a virtual SAN with pooled HDD or SSD capacity

from the hypervisor hosts. EMC ScaleIO, Nutanix, StarWind, and other vendors have provided

similar server-based SDS solutions.

Architecture. VMware VSAN integrated with vSphere aggregates disks locally attached to

servers into a storage cluster. This allows for storage provisioning from VMware vCenter during

virtual machine provisioning operations. The use of the Storage Policy Based Management

(SPBM) platform that is available since the VMware vSphere 5.5 release enables delivery of

VM-centric storage services.36

A minimum of three vSphere hosts are required to form a VSAN-supported cluster. Each

vSphere host that contributes storage to the Virtual SAN cluster requires a disk controller and

must have at least one flash-based device used for read/write cache and at least one hard disk

drive (HDD) used for data storage. Flash-based devices do not contribute to the overall size of

Page 31: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 31

the distributed VSAN shared datastore. VMware recommends that each vSphere host in a

VSAN cluster has at least one 10 Gbps network adapter.

VSAN utilizes a policy-based approach to define storage attributes like capacity, performance,

and availability. The policies are created on a per-VM basis.

Deployment Options. VMware offers VSAN to customers either through OEMs as VSAN

Ready Nodes or as fully customized solutions. The OEMs (Dell, HP, Cisco, et al) provide VSAN

Ready Nodes that are pre-configured and pre-validated x86 servers compatible with VSAN.

Alternatively, customers can purchase server hardware directly by choosing from about 150

VSAN-certified components.36 Regarding pricing, the VSAN licenses are sold separately from

the server and on a per-CPU basis, in contrast to a typical per terabyte storage licensing (see

the licensing for ScaleIO, Section 9.5).

Use Cases. The most common use cases for VMware VSAN are VDI, DR, and

development/test environments.36 The reader should note that VSAN is not supposed to be

considered as a replacement for shared storage arrays.37 Indeed, enterprise storage frames and

midrange arrays having rich data services (high-performance snaps, advanced replication, data-

at-rest encryption, compliance auditing, etc.) retain their place in the data center to provide

storage services for mission-critical applications. A comparison of EMC ScaleIO (Section 9.4)

and VMware VSAN38 is shown in Table 6.

Page 32: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 32

Feature EMC ScaleIO VMware VSAN

Environment ScaleIO works in both physical and virtual

environments.

VSAN works only in VMware

ESXi environments.

Supported

OS/hypervisor

Supported hypervisors: VMware ESXi,

Microsoft Hyper-V, Citrix XenServer, Linux

KVM. Supported physical environments:

Windows, Linux (RHEL, CentOS, SUSE,

Ubuntu on ARM processors).

Only VMware vSphere 5.5

and later. VSAN locks into the

vSphere kernel so a virtual

operating system does not run

on top of the server.

Scalability ScaleIO designed for "cloud scale" (tens to

tens of thousands of nodes).

The maximum VSAN

configuration is 32 nodes.

Storage

media

ScaleIO uses any storage media (HDD, SSD,

PCIe flash) as the primary storage location for

the application data. A ScaleIO host can

consist only of HDDs or SSDs and still

participate in a ScaleIO cluster.

VSAN uses only HDDs for

storing data and only SSDs to

provide read and write

caching. A VSAN host with

only HDDs or only SSDs will

not be allowed to participate in

a VSAN configuration (servers

require at least one of both).

Management Using the Scale IO management tools. Managed via vCenter.

Product

maturity

ScaleIO has been deployed in production

environments for two years.

GA date: March 2014.

Licensing ScaleIO software is licensed in 1 TB raw

capacity increments.

Like vSphere licenses, Virtual

SAN licenses have per CPU

capacity for all the hosts

participating in a cluster.

Table 6: Comparison of ScaleIO and VMware VSAN Features

Page 33: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 33

9.7 VMware ECO:Rail

VMware EVO:RAIL, introduced at VMworld 2014, is a natural step after developing VMware

VSAN (Section 9.6) with a goal to combine compute, networking, storage, and management

services as an SDDC stack into an all-in-one hyper-converged infrastructure appliance that is

100% based on VMware software.39 The EVO:RAIL SDDC stack includes VMware vSphere

Enterprise Plus and ESXi, vCenter Server, VMware Virtual SAN (VSAN) for storage, and

VMware vRealize Log Insight.

Architecture. Each EVO:RAIL appliance is a 2U/4-node unit, each node of which is an

independent physical server within the 2U enclosure. By adding a second, third, and fourth

appliance, EVO:RAIL scales to 8, 12, or 16 ESXi hosts in a cluster with a single VSAN datastore

managed by a dedicated vCenter Server. EVO:RAIL is based on the recent release of vSphere

5.5 U2. Each EVO:RAIL appliance has 100 GHz of compute power, 768 GB of memory

capacity, and 14.4 TB of raw storage capacity (HDD) (plus 1.6TB of flash [SSD] for read/write

cache).

During the initial setup, the EVO:RAIL engine starts on the boot drive to build the appliance. The

engine runs on the same VM as the vCenter Server – on ESXi host #1. When the appliance is

started, the vCenter Server is powered on automatically and the EVO:RAIL engine can then be

used to configure the appliance. During compute load-balancing, the engine may move to other

ESXi hosts in the cluster. Customers cannot leverage the existing vCenter Server for EVO:RAIL

as EVO:RAIL appliance uses a dedicated vCenter Server.

Procurement Options. As VMware does not show any intention of becoming a hardware seller,

instead of shipping the complete hyper-converged EVO product itself, it sells the SDDC EVO

software to the Qualified EVO:RAIL Partners (QEP) who then resell it via a single stock keeping

unit (SKU) that includes hardware, software, service, and support. Pricing is set by QEPs, such

as Dell, EMC, HDS, NetApp, HP, Fujitsu, Inspur, Net One Systems, and Supermicro. The most

up to date list of QEPs can be found at http://www.vmware.com/products/evorail/pricing.html.

As EVO:RAIL is not a reference architecture but an all-in-one appliance, customers cannot

purchase the EVO:RAIL software standalone and attempt to build their own hyper-converged

appliance using an EVO:RAIL Partners-qualified and -optimized hardware or non-qualified

server hardware.

Page 34: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 34

Support and Maintenance. EVO:RAIL is a vendor-supported product. The vendor provides all

patches of the product, as it is tightly integrated with the hardware and firmware of the

appliance. All support is handled through a QEP rather than VMware. This ensures that end-to-

end support is consistently provided for both hardware and software.

Use Cases. While converged infrastructure solutions such as EMC VCE and NetApp FlexPod

target high-end enterprise environments, EVO:RAIL as a hyper-converged system aims at using

it for non-mission critical applications both by large enterprises and small to medium businesses

(SMBs).39 It can also be used in remote offices and branch offices (ROBOs). Whereas a

converged platform installation can take days and weeks to assemble, EVO:RAIL can be set up

in less than an hour. The easy installation and simple management mean that more ROBOs can

afford having their own IT infrastructure based on ECO:RAIL. A single EVO:RAIL appliance

supports approximately 100 general purpose virtual machines or 250 virtual desktops.

Automation of the management tasks for vSphere environment (creating VMs, datastores, etc.)

is one of the benefits offered by EVO:RAIL. EVO:RAIL also provides an automated patch and

upgrade mechanism for vSphere, vCenter, and the EVO:RAIL engine itself.

However, there are some limitations in the feature set provided by EVO:RAIL:

While NFS and iSCSI external storage is accessible using the standard tools via the

vSphere Web Client, Fibre Channel SAN storage cannot be connected to EVO:RAIL, as

there are no spare expansion slots for Fibre Channel HBAs in each of the four physical

server nodes. Dell offers a Nexenta appliance connected to EVO:RAIL via NFS for file

access.

EVO:RAIL currently does not support compression or deduplication.

If you want to migrate VMs from your existing virtualized environment to EVO:RAIL,

keep in mind that EVO:RAIL uses its own vCenter Server instance, and VMs cannot be

simply moved with vMotion and Storage vMotion to EVO:RAIL.

If you need to add hardware like flashcards or graphics cards to your ECO:RAIL

appliance to meet workload demand, you cannot do it.

The storage profile for EVO:RAIL is described as a Tier 2 storage. If your workload

requires higher storage performance, you need to look at other solutions.

Page 35: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 35

EVO:RACK. While EVO:RAIL is targeting mid-size companies, the upcoming EVO:RACK pre-

reviewed at VMworld 2014 will be aimed at meeting the requirements of large enterprises.40

EVO:RACK including the full vCloud suite, Virtual SAN, NSX, and rack-level management tools

can scale to tens of racks on pre-integrated hardware configurations ranging from Open

Compute Project-based hardware to industry-standard OEM servers and converged

infrastructures.

10. Conclusion

As I discussed in this article, the term software-defined storage (SDS) used as an umbrella-type

concept for various SDS visions and respective technologies is still in process of developing its

industry-accepted definition, standards, SDS stacks, and ecosystem of vendors and products.

Nevertheless, SDS as a technology innovation-enabling transition to the Third Platform is

attracting attention of IT organizations looking for implementing service-centric agile storage

infrastructures. As a result, SDS affects future infrastructure design, storage service road maps,

and budget priorities. Instead of a “rip and replace” of the existing storage ecosystems,

organizations can leverage existing traditional storage systems and develop a strategy for

smooth integration with SDS. We as users need to work with both established storage vendors

and emerging vendors who are bringing innovative SDS solutions to the market so that we can

understand their SDS product features and develop our own storage service strategy leading to

transformation into the Third Platform to meet growing diverse business requirements.

Page 36: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 36

11. References

1. K. Clarke. SDE: Software-defined Everything. http://cloud.verizon.com/cloud-blog/sde-

software-defined.

2. http://en.wikipedia.org/wiki/Software-defined_radio

3. C. Karamanolis. “2013 predictions: The year of software-defined storage?” VMware

Chief Technology Office. http://cto.vmware.com/2013-predictions-the-year-of-software-

defined-storage.

4. Delivering on the Promise of the Software-defined Data Center. VMware Accelerate

Advisory Services. 2013.

5. http://www.cisco.com/c/en/us/solutions/data-center-virtualization/application-centric-

infrastructure.

6. Software-defined Environment. IBM eBook. IBM, 2013.

7. J. Smith. Intel: Next Generation Software-defined Storage. EMC World, 2014.

8. http://www.emcfederation.com

9. R. Castanga. Storage Magazine. December 2014, pp.3-4.

10. V. Filks. Looking to the future, what will happen, how will things look?

http://blogs.gartner.com/valdis-filks/2014/02/03/looking-to-the-future-what-will-happen-

how-will-things-look

11. M. Carlson, A. Yoder, L. Schoeb, D. Deel, C. Pratt. Software-defined Storage. Working

Draft. SNIA, 2014.

12. Emerging Technology Analysis: Software-defined Storage Could Herald a Storage

Architecture Evolution. Gartner, 30 August 2013.

13. The Future of Enterprise Storage: SDS and Hyper-converged. Baird Equity Research

Technology & Services, 2014.

14. S.D. Lowe. Software-defined Storage for Dummies. Nutanix Special Edition. Wiley,

2014.

15. http://www.vmware.com/software-defined-datacenter/storage

16. http://ceph.com

17. http://www.gluster.org

18. M. Gloukhovtsev. Does the Advent of Cloud Storage Mean “Creation by Destruction” of

Traditional Storage? EMC Proven Professional Knowledge Sharing, 2013.

19. M. Gloukhovtsev. Does Big Data Mean Big Storage? EMC Proven Professional

Knowledge Sharing, 2014.

20. C. Hollis. 2015 — The 3rd Platform Gets Real For IT. http://www.emc.com/microsites/cio

Page 37: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 37

21. The State of Software-defined Storage (SDS). Market Survey. DataCore Software, 2014.

22. A. Nadkarni. Software-defined Storage: A Fundamental Shift in How Storage Is

Delivered and Used. IDC, June 2014.

23. http://www-03.ibm.com/systems/storage/software-defined-storage

24. P. C. Zikopoulos, D. deRoos, K. Parasuraman, T. Deutsch, D. Corrigan, J. Giles.

Harness the Power of Big Data. The IBM Big Data Platform. McGraw-Hill, 2013.

25. D. Martin Next Generation Storage Networking for Next Generation Data Centers. SNIA

Storage Developer Conference, 2014.

26. M. Dunn, T. Feldman. Shingled Magnetic Recording Models. SNIA Storage Developer

Conference, 2014.

27. http://www.emc.com/vipr

28. Rethink Storage. EMC White Paper. EMC, 2013.

29. EMC ViPR Concepts Guide. EMC, 2014.

30. P. Hallur. Third Party Array Support in ViPR Through OpenStack

http://www.rethinkstorage.com/third-party-array-support-in-vipr-through-openstack

31. http://www.emc.com/storage/ecs-appliance

32. http://www.emc.com/storage/scaleio

33. K. Closson. Leveraging ScaleIO in Software-defined Storage Use Cases. EMC World,

2014.

34. Federation Software-defined Data Center Foundation Infrastructure Solution Guide.

EMC, 2014.

35. A. Farronato, V. Ramachandran. VMware Vision and Strategy for Software-defined

Storage. VMworld, 2014.

36. http://www.vmware.com/products/virtual-san

37. C. Hollis. Using VSAN with storage arrays.

http://chucksblog.emc.com/chucks_blog/2014/03/using-vsan-with-storage-arrays.html

38. C. Sakac. vSAN vs. ScaleIO http://virtualgeek.typepad.com/virtual_geek/2013/07/vsan-

vs-scaleio-fight.html

39. Introduction to VMware EVO: RAIL. VMware White Paper. VMware, 2014.

40. R. Yavatkar. EVO: RACK Tech Preview at VMworld, 2014.

http://blogs.vmware.com/cto/evo-rack-tech-preview-vmworld-2014

Page 38: Software-defined storage: what can it do for you · 2020-05-01 · storage using free, non-proprietary storage software running on commodity hardware is not new. For example, Ceph

2015 EMC Proven Professional Knowledge Sharing 38

EMC believes the information in this publication is accurate as of its publication date. The

information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION

MAKES NO RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO

THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED

WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an

applicable software license.