Software-as-a-Service (SaaS) · underlying technologies that support web services and...

Software-as-a-Service (SaaS)

The traditional model of software distribution, in which software is purchased for and

installed on personal computers, is sometimes referred to as Software-as-a-Product.

Software-as-a-Service is a software distribution model in which applications are hosted

by a vendor or service provider and made available to customers over a network,

typically the Internet. SaaS is becoming an increasingly prevalent delivery model as

underlying technologies that support web services and service-oriented architecture

(SOA)mature and new developmental approaches become popular. SaaS is also often

associated with a pay-as-you-go subscription licensing model. Mean-

while, broadband service has become increasingly available to support useraccess from

more areas around the world. The huge strides made by Internet Service Providers

(ISPs) to increase bandwidth, and the constant introduction of ever more powerful

microprocessors coupled with inexpensive data storage devices, is providing a huge

platform for designing, deploying, and using software across all areas of business and

personal computing. SaaS applications also must be able to interact with other data and

other applications in an equally wide variety of environments and platforms. SaaS is

closely related to other service delivery models we have described. IDC identifies two

slightly different delivery models for SaaS. The hosted application management model is

similar to an Application Service Provider (ASP) model. Here, an ASP hosts

commercially available software for customers and delivers it over the Internet. The

other model is a software on demand model where the provider gives customers

network-based access to a single copy of an application created specifically for SaaS

distribution. IDC predicted that SaaS would make up30% of the software market by

2007 and would be worth $10.7 billion by the end of 2009.

SaaS is most often implemented to provide business software functionality to enterprise

customers at a low cost while allowing those customers to obtain the same benefits

of commercially licensed , internally operated

software without the associated complexity of installation, management, support,

licensing, and high initial cost.

Most customers have little interest in the how or why of software implementation,

deployment, etc., but all have a need to use software in their work. Many types of

software are well suited to the SaaS model (e.g., accounting, customer relationship

management, email software, human resources, IT security, IT service management,

videoconferencing, web analytics, web content management). The distinction between

SaaS and earlier applications delivered over the Internet is that SaaS solutions were

developed specifically to work within a web browser.The architecture of SaaS-based

applications is specifically designed to sup-port many concurrent users (multitenancy)

at once. This is a big difference from the traditional client/server or application service

provider (ASP)-based solutions that cater to a contained audience. SaaS providers, on

the other hand, leverage enormous economies of scale in the deployment, management,

support, and maintenance of their offerings.

SaaS Implementation Issues

Many types of software components and applications frameworks may be employed in

the development of SaaS applications. Using new technology found in these modern

components and application frameworks can drastically reduce the time to market and

cost of converting a traditional on-premises product into a SaaS solution. According to

Microsoft, SaaS architectures can be classified into one of four maturity levels whose key

attributes are ease of configuration, multitenant efficiency, and scalability. Each level is

distinguished from the previous one by the addition of one of these three attributes. The

levels described by Microsoft are as follows

SaaS Architectural Maturity Level 1—Ad-Hoc/Custom.

The first level of maturity is actually no maturity at all. Each customer has a unique,

customized version of the hosted application. The application runs its own instance on

the host’s servers. Migrating a traditional non-networked or client-server application to

this level of SaaS maturity typically requires the least development effort and reduces

operating costs by consolidating server hardware and administration.

SaaS Architectural Maturity Level 2—Configurability.

The second level of SaaS maturity provides greater program flexibility through

configuration metadata. At this level, many customers can use separate instances of the

same application. This allows a vendor to meet the varying needs of each customer by

using detailed configuration options. It also allows the vendor to ease the maintenance

burden by being able to update a common code base.

SaaS Architectural Maturity Level 3—Multitenant Efficiency.

The third maturity level adds multitenancy to the second level. This results in a single

program instance that has the capability to serve all of the vendor’s customers. This

approach enables more efficient use of server resources without any apparent difference

to the end user, but ultimately this level is limited in its ability to scale massively.

SaaS Architectural Maturity Level 4—Scalable.

At the fourth SaaS maturity level, scalability is added by using a multi tiered

architecture. This architecture is capable of supporting a load-balanced farm of identical

application instances running on a variable number of servers, sometimes in the

hundreds or even thousands. System capacity can be dynamically increased or

decreased to match load demand by adding or removing servers, with no need for

further alteration of application software architecture

Key Characteristics of SaaS

Deploying applications in a service-oriented architecture is a more complex problem

than is usually encountered in traditional models of software deployment. As a result,

SaaS applications are generally priced based on the number of users that can have

access to the service. There are often additional fees for the use of help desk services,

extra bandwidth, and storage. SaaS revenue streams to the vendor are usually lower

initially than traditional software license fees. However, the trade-off for lower license

fees is a monthly recurring revenue stream, which is viewed by most corporate CFOs as

a more predictable gauge of how the business is faring quarter to quarter. These

monthly recurring charges are viewed much like maintenance fees for licensed software.

The key characteristics of SaaS software are the following:

1. Network-based management and access to commercially available software from

central locations rather than at each customer’s site, enabling customers to access

applications remotely via the Internet.

2. Application delivery from a one-to-many model (single-instance,multitenant

architecture), as opposed to a traditional one-to-one model.

3. Centralized enhancement and patch updating that obviates any need for

downloading and installing by a user. SaaS is often used in conjunction with a

larger network of communications and collaboration software, sometimes as a

plug-in to a PaaS architecture.

Benefits of the SaaS Model

Application deployment cycles inside companies can take years, consumemassive

resources, and yield unsatisfactory results. Although the initial decision to relinquish

control is a difficult one, it is one that can lead to improved efficiency, lower risk, and a

generous return on investment.

An increasing number of companies want to use the SaaS model for corporate

applications such as customer relationship management and those that fall under the

Sarbanes-Oxley Act compliance umbrella (e.g., financial recording and human

resources). The SaaS model helps enterprises ensure that alllocations are using the

correct application version and, therefore, that the format of the data being recorded

and conveyed is consistent, compatible, and accurate. By placing the responsibility for

an application onto the door-step of a SaaS provider, enterprises can reduce

administration and management burdens they would otherwise have for their own

corporate applications. SaaS also helps to increase the availability of applications to

global locations. SaaS also ensures that all application transactions are logged for

compliance purposes. The benefits of SaaS to the customer are very clear:

1. Streamlined administration

2. Automated update and patch management services

3. Data compatibility across the enterprise (all users have the same version of

software)

4. Facilitated, enterprise-wide collaboration

5. Global accessibility

Server virtualization can be used in SaaS architectures, either in place of or in addition

to multitenancy. A major benefit of platform virtualization is that it can increase a

system’s capacity without any need for additional programming. Conversely, a huge

amount of programming may be required in order to construct more efficient,

mutitenant applications. The effect of combining multitenancy and platform

virtualization into a SaaS solution provides greater flexibility and performance to the

end user. In this chapter, we have discussed how the computing world has moved from

stand-alone, dedicated computing to client/network computing and on into the cloud

for remote computing. The advent of web-based services has given rise to a variety of

service offerings, sometimes known collectively as XaaS. We covered these service

models, focusing on the type of service provided to the customer (i.e., communications,

infrastructure, monitoring, outsourced platforms, and software).

Platform-as-a-Service ( PaaS)

Cloud computing has evolved to include platforms for building and running custom

web-based applications, a concept known as Platform-as-a-Service. PaaS is an

outgrowth of the SaaS application delivery model. ThePaaS model makes all of the

facilities required to support the complete lifecycle of building and delivering web

applications and services entirely available from the Internet, all with no software

downloads or installation for developers, IT managers, or end users. Unlike the IaaS

model, where developers may create a specific operating system instance with home-

grown applications running, PaaS developers are concerned only with web-based

development and generally do not care what operating system isused. PaaS services

allow users to focus on innovation rather than complex infrastructure. Organizations

can redirect a significant portion of their budgets to creating applications that provide

real business value instead of worrying about all the infrastructure issues in a roll-your-

own delivery model. The PaaS model is thus driving a new era of mass innovation. Now,

developers around the world can access unlimited computing power. Any-one with an

Internet connection can build powerful applications and easily deploy them to users

globally

The Traditional On-Premises Model

The traditional approach of building and running on-premises applications has always

been complex, expensive, and risky. Building your own solution has never offered any

guarantee of success. Each application was designed to meet specific business

requirements. Each solution required a specific set of hardware, an operating system, a

database, often a middle- ware package, email and web servers, etc. Once the

hardware and software environment was created, a team of developers had to navigate

complex programming development platforms to build their applications. Additionally,

a team of network, database, and system management experts was needed to keep

everything up and running. Inevitably, a business requirement would force the

developers to make a change to the application. The changed application then required

new test cycles before being distributed. Large companies often needed

specialized facilities to house their data centers. Enormous amounts of electricity also

were needed to power the servers as well as to keep the systems cool. Finally, all of this

required use of fail-over sites to mirror the data center so that information could be

replicated in case of a disaster. Old days, old ways—now, let’s fly into the silver lining of

todays cloud.

The New Cloud Model

PaaS offers a faster, more cost-effective model for application development and delivery.

PaaS provides all the infrastructure needed to run applications over the Internet. Such

is the case with companies such as Amazon. com, eBay, Google, iTunes, and YouTube.

The new cloud model has made it possible to deliver such new capabilities to new

markets via the web browsers. PaaS is based on a metering or subscription model, so

users pay only for what they use.

PaaS offerings include workflow facilities for applicationdesign,application

development, testing, deployment, and hosting, as well as application services such as

virtual offices, team collaboration, database integration, security, scalability, storage,

persistence, state management, dashboard instrumentation, etc

Key Characteristics of PaaS

Chief characteristics of PaaS include services to develop, test, deploy, host,and manage

applications to support the application development life cycle. Web-based

user interface creation tools typically provide some level of sup-port to simplify the

creation of user interfaces, based either on common standards such as HTML and

JavaScript or on other, proprietary technologies. Supporting a multitenant architecture

helps to remove developer concerns regarding the use of the application by many

concurrent users. PaaS providers often include services for concurrency management,

scalability,fail-over and security. Another characteristic is the integration with web ser-

vices and databases. Support for Simple Object Access Protocol (SOAP) and other

interfaces allows PaaS offerings to create combinations of web services(called mashups)

as well as having the ability to access databases and reuse services maintained inside

private networks. The ability to form and sharecode with ad-hoc, predefined, or

distributed teams greatly enhances the productivity of PaaS offerings. Integrated PaaS

offerings provide an opportunity for developers to have much greater insight into the

inner workings of their applications and the behavior of their users by implementing

dash-board-like tools to view the inner workings based on measurements such as

performance, number of concurrent accesses, etc. Some PaaS offerings leverage this

instrumentation to enable pay-per-use billing models.

Infrastructure-as-a-Service (IaaS)

According to the online reference Wikipedia, Infrastructure-as-a-Service(IaaS) is the

delivery of computer infrastructure (typically a platform virtualization environment) as

a service.

IaaS leverages significant technology, services, and data center investments to deliver

IT as a service to customers. Unlike traditional outsourcing, which requires extensive

due diligence, negotiations ad infinitum, and complex, lengthy contract vehicles, IaaS is

centered around a model of service delivery that provisions a predefined, standardized

infrastructure specifically optimized for the customer’s applica-tions. Simplified

statements of work and à la carte service-level choices make it easy to tailor a solution to

a customer’s specific application requirements. IaaS providers manage the transition

and hosting of selected applications on their infrastructure. Customers maintain

ownership and management of their application(s) while off-loading hosting operations

and infrastructure management to the IaaS provider. Provider-owned implementations

typically include the following layered components:

• Computer hardware (typically set up as a grid for massive horizontal scalability)

• Computer network (including routers, firewalls, load balancing,etc.)

• Internet connectivity (often on OC 192 backbones)

• Platform virtualization environment for running client-specified virtual machines

• Service-level agreements

• Utility computing billing

Rather than purchasing data center space, servers, software, network equipment, etc.,

IaaS customers essentially rent those resources as a fully outsourced service. Usually,

the service is billed on a monthly basis, just like a utility company bills customers. The

customer is charged only for resources consumed.

The chief benefits of using this type of outsourced ser-vice include:

• Ready access to a pre configured environment that is generally ITIL-based(The

Information Technology Infrastructure Library [ITIL] is a customized framework

of best practices designed to promote quality computing services in the IT

sector.)

• Use of the latest technology for infrastructure equipment

• Secured, “sand-boxed” (protected and insulated) computing plat-forms that are

usually security monitored for breaches

• Reduced risk by having off-site resources maintained by third parties

• Ability to manage service-demand peaks and valleys

• Lower costs that allow expensing service costs instead of making capital

investments

• Reduced time, cost, and complexity in adding new features or capabilities

Modern On-Demand Computing

On-demand computing is an increasingly popular enterprise model

in which computing resources are made available to the user as needed. Computing

resources that are maintained on a user’s site are becoming fewer and fewer, while those

made available by a service provider are on the rise. The on-demand model evolved to

overcome the challenge of being able to meet fluctuating resource demands efficiently.

Because demand for computing resources can vary drastically from one time to another,

maintaining sufficient resources to meet peak requirements can be costly.Over

engineering a solution can be just as adverse as a situation where the enterprise cuts

costs by maintaining only minimal computing resources, resulting in insufficient

resources to meet peak load requirements. Concepts such as clustered computing, grid

computing, utility computing, etc., may all seem very similar to the concept of on-

demand computing, but they can be better understood if one thinks of them as building

blocks

that evolved over time and with techno-evolution to achieve the modern cloud

computing model we think of and use today (see Figure 2.1).One example we will

examine is Amazon’s Elastic Compute Cloud(Amazon EC2). This is a web service that

provides resizable computing capacity in the cloud. It is designed to make web-scale

computing easier for developers and offers many advantages to customers:

• It’s web service interface allows customers to obtain and configure capacity with

minimal effort.

• It provides users with complete control of their (leased) computing resources and

lets them run on a proven computing environment.

• It reduces the time required to obtain and boot new server instances to minutes,

allowing customers to quickly scale capacity as their computing demands dictate.

• It changes the economics of computing by allowing clients to pay only for

capacity they actually use.

• It provides developers the tools needed to build failure-resilient applications and

isolate themselves from common failure scenarios.

Amazon’s Elastic Cloud

Amazon EC2 presents a true virtual computing environment, allowing clients to use a

web-based interface to obtain and manage services needed to launch one or more

instances of a variety of operating systems (OSs). Clients can load the OS environments

with their customized applications. They can manage their network’s access permissions

and run as many or as few systems as needed. In order to use Amazon EC2, clients first

need to create an Amazon Machine Image (AMI). This image contains the applications,

libraries, data, and associated configuration settings used in the virtual computing

environment. Amazon EC2 offers the use of preconfigured images built with templates

to get up and running immediately. Once users have defined and configured their AMI,

they use the Amazon EC2 tools provided for storing the AMI by uploading the AMI into

Amazon S3. Amazon S3 is a repository that provides safe, reliable, and fast access to a

client AMI. Before clients can use the AMI, they must use the Amazon EC2 web service

to configure security and network access.

Amazon EC2 Service Characteristics

There are quite a few characteristics of the EC2 service that provide significant benefits

to an enterprise. First of all, Amazon EC2 provides financial benefits. Because of

Amazon’s massive scale and large customer base, it is an in expensive alternative to

many other possible solutions. The costs incurred to set up and run an operation are

shared over many customers, making the overall cost to any single customer much lower

than almost any other alter-native. Customers pay a very low rate for the compute

capacity they actually consume. Security is also provided through Amazon EC2 web

service inter-faces. These allow users to configure firewall settings that control

network access to and between groups of instances.

Amazon EC2 offers a highly reliable environment where replacement instances can be

rapidly provisioned.

When one compares this solution to the significant up-front expendi-tures traditionally

required to purchase and maintain hardware, either in-house or hosted, the decision to

outsource is not hard to make. Outsourced solutions like EC2 free customers from many

of the complexities of capacity planning and allow clients to move from large capital

investments and fixed costs to smaller, variable, expensed costs. This approach removes

the need to over buy and over build capacity to handle periodic traffic spikes. The EC2

service runs within Amazon’s proven, secure, and reliable network infra-structure and

data center locations.

Dynamic Scalability

Amazon EC2 enables users to increase or decrease capacity in a few min-utes. Users can

invoke a single instance, hundreds of instances, or even thousands of instances

simultaneously. Of course, because this is all con-trolled with web service APIs, an

application can automatically scale itself up or down depending on its needs. This type

of dynamic scalability is very attractive to enterprise customers because it allows them

to meet their customers’ demands without having to overbuild their infrastructure.

Configuration Flexibility

Configuration settings can vary widely among users. They have the choice of multiple

instance types, operating systems, and software packages. Amazon EC2 allows them to

select a configuration of memory, CPU, and instance storage that is optimal for their

choice of operating system and application. For example, a user’s choice of operating

systems may also include numerous Linux distributions, Microsoft Windows Server, and

even an Open Solaris environment, all running on virtual servers.

Integration with Other Amazon Web Services

Amazon EC2 works in conjunction with a variety of other Amazon web ser-vices. For

example, Amazon Simple Storage Service (Amazon S3), Amazon SimpleDB, Amazon

Simple Queue Service (Amazon SQS), and Amazon Cloud Front are all integrated to

provide a complete solution for computing, query processing, and storage across a wide

range of applications. Amazon S3 provides a web services interface that allows users

to store and retrieve any amount of data from the Internet at any time, anywhere. It

gives developers direct access to the same highly scalable, reliable, fast inexpensive data

storage infrastructure Amazon uses to run its own global network of web sites. The S3

service aims to maximize benefits of scale and to pass those benefits on to developers.

Monitoring-as-a-Service (MaaS)

Monitoring-as-a-Service (MaaS) is the outsourced provisioning of security, primarily on

business platforms that leverage the Internet to conduct business.

MaaS has become increasingly popular over the last decade. Since the advent of cloud

computing, its popularity has, grown even more. Security monitoring involves

protecting an enterprise or government client from cyber threats. A security team plays

a crucial role in securing and maintaining the confidentiality, integrity, and availability

of IT assets. However, time and resource constraints limit security operations and their

effectiveness for most companies. This requires constant vigilance over the security

infra-structure and critical information assets. Many industry regulations require

organizations to monitor their security environment, server logs, and other information

assets to ensure the integrity of these systems. However, conducting effective security

monitoring can be a daunting task because it requires advanced technology, skilled

security experts, and scalable processes—none of which come cheap. MaaS security

monitoring services offer real-time, 24/7 monitoring and nearly immediate incident

response across a security infrastructure—they help to protect critical information assets

of their customers. Prior to the advent of electronic security systems, security

monitoring and response were heavily dependent on human resources and human

capabilities, which also limited the accuracy and effectiveness of monitoring efforts.

Over the past two decades, the adoption of information technology into facility security

systems, and their ability to be connected to security operations centers(SOCs) via

corporate networks, has significantly changed that picture. This means two important

things:

(1) The total cost of ownership (TCO) for traditional SOCs is much higher than for a

modern-technology SOC; and

(2)achieving lower security operations costs and higher security effectiveness means

that modern SOC architecture must use security and IT technology to address security

risks.

Protection Against Internal and External Threats

SOC-based security monitoring services can improve the effectiveness of a customer

security infrastructure by actively analyzing logs and alerts from infrastructure devices

around the clock and in real time. Monitoring teams correlate information from various

security devices to provide security analysts with the data they need to eliminate false

positives and respond to true threats against the enterprise. Having consistent access to

the skills needed to maintain the level of service an organization requires for enterprise-

level monitoring is a huge issue. The information security team can assess system

performance on a periodically recurring basis and provide recommendations for

improvements as needed. Typical services provided by many MaaS vendors

are described below.

Early Detection

An early detection service detects and reports new security vulnerabilitiesshortly after

they appear. Generally, the threats are correlated with third-party sources, and an alert

or report is issued to customers. This report is usually sent by email to the person

designated by the company. Security vulnerability reports, aside from containing a

detailed description of the vulnerability and the platforms affected, also include

information on the impact the exploitation of this vulnerability would have on the

systems or applications previously selected by the company receiving the report. Most

often, the report also indicates specific actions to be taken to minimize the effect of the

vulnerability, if that is known.

Platform, Control, and Services Monitoring

Platform, control, and services monitoring is often implemented as a dash-board

interface and makes it possible to know the operational status of the platform being

monitored at any time. It is accessible from a web interface, making remote access

possible. Each operational element that is monitored usually provides an operational

status indicator, always taking into account the critical impact of each element. This

service aids in determining which elements may be operating at or near capacity or

beyond the limits of established parameters. By detecting and identifying such

problems, preventive measures can be taken to prevent loss of service.

Intelligent Log Centralization and Analysis

Intelligent log centralization and analysis is a monitoring solution based mainly on the

correlation and matching of log entries. Such analysis helps to establish a baseline of

operational performance and provides an index of security threat. Alarms can be raised

in the event an incident moves the established baseline parameters beyond a stipulated

threshold. These types of sophisticated tools are used by a team of security experts who

are responsible for incident response once such a threshold has been crossed and the

threat has generated an alarm or warning picked up by security analysts monitoring the

systems.

Vulnerabilities Detection and Management

Vulnerabilities detection and management enables automated verification and

management of the security level of information systems. The service periodically

performs a series of automated tests for the purpose of identifying system weaknesses

that may be exposed over the Internet, including the possibility of unauthorized access

to administrative services, the existence of services that have not been updated, the

detection of vulnerabilities such as phishing, etc. The service performs periodic follow-

up of tasks performed by security professionals managing information systems security

and pro-vides reports that can be used to implement a plan for continuous improvement

of the system’s security level.

Continuous System Patching/Upgrade and Fortification

Security posture is enhanced with continuous system patching and upgrading of systems

and application software. New patches, updates, and service packs for the equipment’s

operating system are necessary to maintain adequate security levels and support new

versions of installed products. Keeping abreast of all the changes to all the software and

hardware requires a committed effort to stay informed and to communicate gaps in

security that can appear in installed systems and applications.

Intervention, Forensics, and Help Desk Services

Quick intervention when a threat is detected is crucial to mitigating the effects of a

threat. This requires security engineers with ample knowledge in the various

technologies and with the ability to support applications as well as infrastructures on a

24/7 basis. MaaS platforms routinely provide this ser-vice to their customers. When a

detected threat is analyzed, it often requires forensic analysis to determine what it is,

how much effort it will take to fix the problem, and what effects are likely to be seen.

When problems are encountered, the first thing customers tend to do is pick up the

phone.Help desk services provide assistance on questions or issues about the operation

of running systems. This service includes assistance in writing failure reports, managing

operating problems, etc.

Delivering Business Value

Some consider balancing the overall economic impact of any build-versus-buy decision

as a more significant measure than simply calculating a return on investment (ROI). The

key cost categories that are most often associated with MaaS are:--

(1) Service fees for security event monitoring for all firewalls and intrusion detection

devices, servers, and routers;

(2) Internal account maintenance and administration costs; and

(3) Preplanning and development costs. Based on the total cost of ownership, whenever

a customer evaluates the option of an in-house security information monitoring team

and infra-structure compared to outsourcing to a service provider, it does not take long

to realize that establishing and maintaining an in-house capability is not as attractive as

outsourcing the service to a provider with an existing infrastructure. Having an in-house

security operations center forces a company to deal with issues such as staff attrition,

scheduling, around the clock operations, etc. Losses incurred from external and internal

incidents are extremely significant, as evidenced by a regular stream of high-profile

cases in the news. The generally accepted method of valuing the risk of losses from

external and internal incidents is to look at the amount of a potential loss, assume

a frequency of loss, and estimate a probability for incurring the loss. Although this

method is not perfect, it provides a means for tracking information security metrics.

Risk is used as a filter to capture uncertainty about varying cost and benefit estimates. If

a risk-adjusted ROI demonstrates a compelling business case, it raises confidence that

the investment is likely to succeed because the risks that threaten the project have been

considered and quantified. Flexibility represents an investment in additional capacity or

agility today that can be turned into future business benefits at some additional cost.

This provides an organization with the ability to engage in future initiatives, but not the

obligation to do so. The value of flexibility is unique to each organization, and

willingness to measure its value varies from company to company.

Real-Time Log Monitoring Enables Compliance

Security monitoring services can also help customers comply with industry regulations

by automating the collection and reporting of specific events of interest, such as log-in

failures. Regulations and industry guidelines often require log monitoring of critical

servers to ensure the integrity of confidential data. MaaS providers’ security monitoring

services automate this time-consuming process

Communication-as-a-Service (CaaS)

CaaS is an outsourced enterprise communications solution. Providers of this type of

cloud-based solution (known as CaaS vendors) are responsible for the management of

hardware and software required for delivering Voiceover IP (VoIP) services, Instant

Messaging (IM), and video conferencing capabilities to their customers. This model

began its evolutionary process from within the telecommunications (Telco) industry, not

unlike how the SaaS model arose from the software delivery services sector. CaaS

vendors are responsible for all of the hardware and software management consumed by

their user base. CaaS vendors typically offer guaranteed quality of service(QoS) under a

service-level agreement (SLA). A CaaS model allows a CaaS provider’s

business customers to selectively deploy communications features and services

throughout their company on a pay-as-you-go basis for service(s) used. CaaS is designed

on a utility-like pricing model that provides users with comprehensive, flexible, and

(usually) simple-to-understand service plans. According to Gartner,the CaaS market is

expected to total $2.3 billion in 2011, representing a compound annual growth rate of

more than 105% for the period. CaaS service offerings are often bundled and may

include integrated access to traditional voice (or VoIP) and data, advanced unified

communications functionality such as video calling, web collaboration, chat, real-time

presence and unified messaging, a handset, local and long-distance voice services, voice

mail, advanced calling features (such as caller ID, three-

way and conference calling, etc.) and advanced PBX functionality.

A CaaS solution includes redundant switching, network, POP and circuit diversity,

customer premises equipment redundancy, and WAN fail-over that specifically

addresses the needs of their customers. All VoIP transport components are located in

geographically diverse, secure data centers for high availability and survivability.CaaS

offers flexibility and scalability that small and medium-sized business might not

otherwise be able to afford. CaaS service providers are usually prepared to handle peak

loads for their customers by providing services capable of allowing more capacity,

devices, modes or area coverage as their customer demand necessitates. Network

capacity and feature sets can be changed dynamically, so functionality keeps pace with

consumer demand and provider-owned resources are not wasted. From the service

provider customer’s perspective, there is very little to virtually no risk of the service

becoming obsolete, since the provider’s responsibility is to perform periodic upgrades or

replacements of hardware and software to keep the platform technologically current.

CaaS requires little to no management oversight from customers. It eliminates the

business customer’s need for any capital investment in infra-structure, and it eliminates

expense for ongoing maintenance and operations overhead for infrastructure. With a

CaaS solution, customers are able to leverage enterprise-class communication services

without having to build a premises-based solution of their own. This allows those

customers to reallocate budget and personnel resources to where their business can best

use them.

Advantages of CaaS

From the handset found on each employee’s desk to the PC-based software client on

employee laptops, to the VoIP private backbone, and all modes in between, every

component in a CaaS solution is managed 24/7 by the CaaS vendor. As we said

previously, the expense of managing a carrier-grade data center is shared across the

vendor’s customer base, making it more economical for businesses to implement CaaS

than to build their own VoIP network. Let’s look as some of the advantages of a hosted

approach for CaaS.

Hosted and Managed Solutions

Remote management of infrastructure services provided by third parties once seemed

an unacceptable situation to most companies. However, over the past decade, with

enhanced technology, networking, and software, the attitude has changed. This is, in

part, due to cost savings achieved in using those services. However, unlike the “one-off”

services offered by specialist providers, CaaS delivers a complete communications

solution that is entirely managed by a single vendor. Along with features such as VoIP

and unified communications, the integration of core PBX features with advanced

functionality is managed by one vendor, who is responsible for all of the integration and

delivery of services to users.

Fully Integrated, Enterprise-Class Unified Communications

With CaaS, the vendor provides voice and data access and manages

LAN/ WAN, security, routers, email, voice mail, and data storage. By managing the

LAN/WAN, the vendor can guarantee consistent quality of service from a user’s desktop

across the network and back. Advanced unified communications features that are most

often a part of a standard CaaS deployment include:

• Chat

• Multimedia conferencing

• Microsoft Outlook integration

• Real-time presence

• “Soft” phones (software-based telephones)

• Video calling

• Unified messaging and mobility

Providers are constantly offering new enhancements (in both performance and features)

to their CaaS services. The development process and subsequent introduction of new

features in applications is much faster, easier, and more economical than ever before.

This is, in large part, because the service provider is doing work that benefits many end

users across the provider’s scalable platform infrastructure. Because many end users of

the provider’s service ultimately share this cost (which, from their perspective, is

miniscule compared to shouldering the burden alone), services can be offered to

individual customers at a cost that is attractive to them.

No Capital Expenses Needed

When business outsource their unified communications needs to a CaaS service

provider, the provider supplies a complete solution that fits the company’s exact needs.

Customers pay a fee (usually billed monthly) for what they use. Customers are not

required to purchase equipment, so there is no capital outlay. Bundled in these types of

services are ongoing maintenance and upgrade costs, which are incurred by the service

provider. The use of CaaS services allows companies the ability to collaborate across any

work-space. Advanced collaboration tools are now used to create high-quality secure,

adaptive work spaces throughout any organization. This allows a company’s workers,

partners, vendors, and customers to communicate and collaborate more effectively.

Better communication allows organizations to adapt quickly to market changes and to

build competitive advantage. CaaS can also accelerate decision making within an

organization. Innovative unified communications capabilities (such as presence, instant

messaging, and rich media services) help ensure that information quickly reaches who

ever needs it.

Flexible Capacity and Feature Set

When customers outsource communications services to a CaaS provider,they pay for

the features they need when they need them. The service provider can distribute the cost

services and delivery across a large customer base. As previously stated, this makes the

use of shared feature functionality more economical for customers to implement.

Economies of scale allow ser-vice providers enough flexibility that they are not tied to a

single vendor investment. They are able to leverage best-of-breed providers such as

Avaya, Cisco, Juniper, Microsoft, Nortel and Shore Tel more economically

than any independent enterprise.

No Risk of Obsolescence

Rapid technology advances, predicted long ago and known as Moore’s law,

have brought about product obsolescence in increasingly shorter periods of time.

Moore’s law describes a trend he recognized that has held true since the beginning of the

use of integrated circuits (ICs) in computing hardware. Since the invention of the

integrated circuit in 1958, the number of transistors that can be placed inexpensively on

an integrated circuit has increased exponentially, doubling approximately every two

years. Unlike IC components, the average life cycles for PBXs and key com-munications

equipment and systems range anywhere from five to 10 years. With the constant

introduction of newer models for all sorts of technology (PCs, cell phones, video

software and hardware, etc.), these types of products now face much shorter life cycles,

sometimes as short as a single year. CaaS vendors must absorb this burden for the user

by continuously upgrading the equipment in their offerings to meet changing demands

in the marketplace.

No Facilities and Engineering Costs Incurred

CaaS providers host all of the equipment needed to provide their services to their

customers, virtually eliminating the need for customers to maintain data center space

and facilities. There is no extra expense for the constant power consumption that such a

facility would demand. Customers receive the benefit of multiple carrier-grade data

centers with full redundancy—and it’s all included in the monthly payment.

Guaranteed Business Continuity

If a catastrophic event occurred at your business’s physical location, would your

company disaster recovery plan allow your business to continue oper-ating without a

break? If your business experienced a serious or extended communications outage, how

long could your company survive? For most businesses, the answer is “not long.”

Distributing risk by using geographically dispersed data centers has become the norm

today. It mitigates risk and allows companies in a location hit by a catastrophic event to

recover as soon as possible. This process is implemented by CaaS providers because

most companies don’t even contemplate voice continuity if catastrophe strikes.Unlike

data continuity, eliminating single points of failure for a voice network is usually cost-

prohibitive because of the large scale and management complexity of the project. With a

CaaS solution, multiple levels of redundancy are built into the system, with no single

point of failure.

Database as a service (DBaaS)

Database as a service (DBaaS) is a cloud computing service model that provides users

with some form of access to a database without the need for setting up physical

hardware, installing software or configuring for performance. All of the administrative

tasks and maintenance are taken care of by the service provider so that all the user or

application owner needs to do is use the database. Of course, if the customer opts for

more control over the database, this option is available and may vary depending on the

provider. Database-as-a-Service (DBaaS) is the fastest growing cloud service, and the

Platform-as-a-Service component and provides dramatic improvements in productivity,

performance, standardization, and data security of databases

The term “Database-as-a-Service” (DBaaS) refers to software that enables users to

provision, manage, consume, configure, and operate database software using a common

set of abstractions (primitives), without having to either know nor care about the exact

implementations of those abstractions for the specific database software.

In other words, a DBaaS user could provision a MySQL database, manage, configure

and operate it using the same set of API calls as he (or she) would use if it were an

Oracle or MongoDB database. The user would be able, for example, to request a backup

of the database using an API call which did the right thing(s) for the database that was

being used. Similarly, the user could request a MySQL cluster or a MongoDB cluster,

and then resize that cluster using the same API call(s), without having to know exactly

how that operation was being performed for each of those database technologies.

DBaaS is often considered to be a component of a Platform-as-a-Service, the “platform”

in this case being the database (or a number of databases). The DBaaS solution would

consume resources of the underlying Infrastructure-as-a-Service (IaaS), for example

provisioning compute, storage and networking from that IaaS.

DBaaS in the context of other cloud components

It is important to understand that like other cloud technologies,

DBaaS has two primary consumers:

The IT organization (which operates the cloud, very often also the DBA)

And the developer (sometimes DevOps, or the end user who uses the cloud resources)

An IT organization deploys DBaaS that enables end users (developers) to provision a

database of their choice from a catalog of supported databases. These could include

https://www.stratoscale.com/blog/hyperconvergence/cloud-101-what-is-iaas/

https://www.stratoscale.com/blog/hyperconvergence/cloud-101-what-is-iaas/

https://www.stratoscale.com/blog/cloud/it-admin-your-job-necessary-for-cloud/

https://www.stratoscale.com/blog/cloud/it-admin-your-job-necessary-for-cloud/

popular relational and non-relational databases and the IT organization can configure

the DBaaS to support specific releases of these software titles. The IT organization can

further restrict the configurations that specific users can provision (for example,

developers can only provision with small memory footprint, with traditional disks while

devops could provision higher capacity servers with SSD’s). Finally, the IT organization

can setup policies for standard database operations like backups to ensure that the data

is properly saved from time to time to allow for recovery when required.

Typically an end user would access the DBaaS system through a portal that allows him

or her to choose from a number of database titles, and in a variety of different

configuration options. With a few clicks this requested this database will be

provisioned for them. The DBaaS system quickly provisions the database and returns a

query able endpoint like:

mysql://192.168.15.243:3306/

and the application developer can use this in his or her application directly. The DBaaS

system provides simple mechanisms to add users, create databases (schemas) and grant

permissions to different users as required by the application.

The benefits of DBaaS

A DBaaS solution provides an organization a number of benefits, the chief among them

being:

• Developer agility

• DBA productivity

• Application reliability, performance and security

• We now examine each of these in turns.

• Developer agility

When a developer wishes to provision a database, the steps involved include

provisioning compute, storage and networking components, configuring them properly

and then installing database software. Finally, the database software must be configured

properly to utilize the underlying infrastructure components.

This multi-step process leaves many opportunities for errors, omissions and non-

standard modes of operation. When the thing that is being provisioned (a database) is

the “system of record”, this is unacceptable.

The IT organization in configuring the DBaaS establishes the standards by which

databases will be provisioned. By standardizing the provisioning model, DBaaS ensures

that a database can be provisioned in a single operation, and that databases are

provisioned in a consistent way, and in a manner that is aligned with the best practices

for that particular database and business.

Once in operation, complex database operations like resizing a cluster are now a simple

API call and the developer need not concern him or her selves with the minutiae of how

this operation should be performed for the specific database and version. The

abstraction provided by the DBaaS handles all of that and allows the developer to focus

his or her energy on the application rather than the underlying database.

Finally, the activities of a developer are often iterative and involve spinning up, using,

and then destroying database servers. Abstractions in the DBaaS allow for the final step

in this process to be automated as well, securely erasing all storage used for the database

and ensuring that all the resources are released and that data integrity is preserved at all

times.

DBA productivity

When an enterprise operates hundreds of instances of many different databases, a

considerable resources get consumed on maintenance and upkeep. This includes things

like tuning, configuration, patching, periodic backups, and so on; all the things that

DBAs have to do to keep databases in proper working order.

DBaaS solutions provide abstractions that allow DBAs to manage groups of databases

and perform operations like upgrades and configuration changes on a fleet of databases

in a simplified way. This frees up the DBAs to focus more on activities like establishing

the standards of operation for the enterprise and verifying that they have the best tools

available for themselves and the developers who they serve.

Application reliability, performance and security

Databases are often the “system of record” and are the repository of valuable

information in the organization. A database outage could have catastrophic impact.

Through automation and standardization, DBaaS ensures that all common workflows

involved in the provisioning, configuration, management, and operation of databases

are consistent.

Through this standardization, a DBaaS ensures that all databases are operated in the

same way, and in keeping with the best practices established by the IT organization.

This , frees up the developer and DBA to work on more important things like the

application and innovation rather than the boring minutiae of running a database.

It is important to realize that most enterprises today operate applications that require

many different database technologies, a departure from recent years where the

‘corporate standard’ mandated a single database solution for all application needs. With

this diversity in database technologies, DBaaS solutions allow IT organizations to ensure

application reliability, performance and data security no matter what database solution

is in use, without requiring that the IT organization or the developer team have deep

knowledge of the finer points of each of the technologies. DBaaS solutions encapsulate

those best practices and codify the proper way(s) to deploy, manage and operate all of

the different technologies thereby freeing up the DBAs and developers from these

chores.

Comparison of some DBaaS solutions

The most widely used DBaaS in the market today is Amazon Relational Database Service

(RDS). RDS provides support for a number of databases including MySQL, MariaDB,

Oracle, PostgreSQL and SQL Server. In addition, Amazon also provides Aurora and

DynamoDB. Aurora is a scalable relational database compatible with MySQL or

PostgreSQL while DynamoDB is a scalable NoSQL database.

Microsoft offers SQL Database as part of the Azure Cloud platform, and Google offers

Cloud SQL, a fully managed MySQL database service.

With the exception of DynamoDB, all of these are DBaaS solutions that provide

management abstractions but no data API. The application that uses the database

interacts directly with the managed database in these cases. In DynamoDB however, the

service offers a data API as well.

In the OpenStack ecosystem, the Trove project offers a DBaaS that supports a number of

relational and non-relational database packages including most commonly used FOSS

databases.

The value of a DBaaS comes through the standardization of the abstractions, and

through the common API. Since the most widely used Cloud API in the world today is

Amazon’s AWS API , there is considerable value in implementing a solution that exposes

its services using that same API. This is the approach that Stratoscale Symphony has

adopted. Symphony exposes the same APIs defined by AWS and allows you to

provision an AWS region and RDS-compatible DBaaS in your own data center.

SERVICE PROVIDERS

Google App Engine

Google App Engine enables developers to build their web apps on the same

infrastructure that powers Google’s own applications.

Features Leveraging Google App Engine, developers can accomplish the following tasks:

• Write code once and deploy Provisioning and configuring multiple machines for web

serving and data storage can be expensive and time-consuming. Google App Engine

makes it easier to deploy web applications by dynamically providing computing

resources as they are needed. Developers write the code, and Google App Engine takes

care of the rest.

• Absorb spikes in traffic When a web app surges in popularity, the sudden increase in

traffic can be overwhelming for applications of all sizes, from startups to large

companies that find themselves re-architecting their databases and entire systems

several times a year. With automatic replication and load balancing, Google App Engine

makes it easier to scale from one user to one million by taking advantage of Bigtable and

other components of Google’s scalable infrastructure.

• Easily integrate with other Google services It’s unnecessary and inefficient for

developers to write components like authentication and email from scratch for each new

application. Developers using Google App Engine can make use of built-in components

and Google’s broader library of APIs that provide plug-and-play functionality for simple

but important features.

“Google has spent years developing infrastructure for scalable web applications,” said

Pete Koomen, a product manager at Google. “We’ve brought Gmail and Google search to

hundreds of millions of people worldwide, and we’ve built out a powerful network of

datacenters to support those applications. Today we’re taking the first step in making

this infrastructure available to all developers.” Cost Google enticed developers by

offering the App Engine for free, when it launched, but after a few months slapped on

some fees. As of this writing, developers using Google App Engine can expect to pay:

• Free quota to get started: 500MB storage and enough CPU and bandwidth for about 5

million pageviews per month

• $0.10–$0.12 per CPU core-hour

• $0.15–$0.18 per GB-month of storage

• $0.11–$0.13 per GB of outgoing bandwidth

• $0.09–$0.11 per GB of incoming bandwidth

In response to developer feedback, Google App Engine will provide new APIs. The

image-manipulation API enables developers to scale, rotate, and crop images on the

server. The memcache API is a high-performance caching layer designed to make page

rendering faster for developers. More information about Google App Engine is available

at http://code.google.com/ appengine/

Amazon Elastic Compute Cloud (Amazon EC2)

Amazon may be the most widely known cloud vendor. They offer services on many

different fronts, from storage to platform to databases. Amazon seems to have their

finger in a number of cloud technologies.

Amazon Elastic Compute Cloud (Amazon EC2) is a web service that offers resizable

compute capacity in the cloud and is designed to make web scaling easier for developers.

Amazon EC2 provides a simple web interface that allows you to obtain and configure

capacity with little difficulty. It allows you control of your computing resources. Amazon

EC2 cuts the time it takes to obtain and boot new server instances to a few minutes,

allowing you to change scale as your needs change. For instance, Amazon EC2 can run

Microsoft Windows Server 2003 and is a way to deploy applications using the Microsoft

Web Platform, including ASP.NET, ASP.NET AJAX, Silverlight, and Internet

Information Server (IIS).

Amazon EC2 allows you to run Windows-based applications on Amazon’s cloud

computing platform. This might be web sites, web-service hosting, high-performance

computing, data processing, media transcoding, ASP.NET application hosting, or any

other application requiring Windows software. EC2 also supports SQL Server Express

and SQL Server Standard and makes those offerings available to customers on an hourly

basis.

MICROSOFT AZURE

Microsoft offers a number of cloud services for organizations of any size—from

enterprises all the way down to mom-and-pop shops or individuals. A good portion of

Microsoft’s cloud offerings are cloud variants of products that people already use, so

cloud versions aren’t that difficult to use.

Azure Services Platform

The cornerstone of Microsoft’s offerings is the Azure Services Platform. The Azure

Services Platform is a cloud computing and services platform hosted in Microsoft

datacenters. The Azure Services Platform supplies a broad range of functionality to build

applications to serve individuals or large enterprises, and everyone in between. The

platform offers a cloud operating system and developer tools. Applications can be

developed with industry standard protocols like REST and SOAP. Azure services can be

used individually or in conjunction with one another to build new applications or to

enhance existing ones. Let’s take a closer look at the Azure Services Platform

components.

Windows Azure

Windows Azure is a cloud-based operating system that enables the development,

hosting, and service management environment for the Azure Services Platform.

Windows Azure gives developers an on-demand compute and storage environment that

they can use to host, scale, and manage web applications through Microsoft datacenters.

To build applications and services, developers can use the Visual Studio skills they

already have. Further, Azure supports existing standards like SOAP, REST, and XML.

Windows Azure can be used to

• Add web service capabilities to existing applications

• Build and modify applications and then move them onto the Web

• Make, test, debug, and distribute web services efficiently and inexpensively

• Reduce the costs of IT management

SALES FORCE.COM

Salesforce.com Salesforce.com made its name with the success of its flagship

Salesforce.com automation application. Today, the company has three primary areas of

focus:

• The Sales Cloud The popular cloud computing sales application

• The Service Cloud The platform for customer service that lets companies tap into the

power of customer conversations no matter where they take place

• Your Cloud Powerful capabilities to develop custom applications on its cloud

computing platform, Force.com

The company has made its platform available to other companies as a place to build and

deploy their software services. Force.com offers

• A relational database

• User interface options

• Business logic

• Apex, an integrated development environment

• Workflow and approvals engine

• Programmable interface

• Automatic mobile device deployment

• Web services integration

• Reporting and analytics

Using Apex, programmers can test their applications in Force.com’s Sandboxes and

then offer the finalized code on Salesforce.com’s site. Developers initially used

Force.com to create add-ons to the Salesforce CRM, but now it is possible to develop

applications that are unrelated to Salesforce.com’s offerings. For instance, gaming giant

Electronic Arts created an employee-recruiting application on Force.com and software

vendor Coda made a general ledger application. Meanwhile, Salesforce.com promotes its

own applications, which are used by more than 1.1 million people. Salesforce.com is into

other cloud services, as well.

In April 2007 it moved into enterprise content management with Salesforce.com

Content. This makes it possible to store, classify, and share information in a manner

similar to Microsoft SharePoint. The company employs a multitenant architecture,

similar to Google, Amazon, and eBay. As such, servers and other resources are shared by

customers, rather than given to a single account. It allows for better performance, better

scalability, better security, and faster innovation through automatic upgrades.

Multitenancy also allows apps to be elastic—they can scale up to tens of thousands of

users, or down to just a few—always something to consider when moving to cloud-based

solutions. As with other providers, upgrades are taken care of by Salesforce.com for

their customers, so apps get security and performance enhancements automatically.

Because the company generates all its income based on cloud computing,

Salesforce.com is a good bellwether for assessing the growth rate of the application side

of cloud computing. Salesforce.com’s revenue grew to US$290 million in the quarter

ending January 31, 2009— a 34 percent increase year-over-year.

Force.com

Force.com is Salesforce.com’s on-demand cloud computing platform—billed by

Salesforce .com as the world’s first PaaS. Force.com features Visualforce, a technology

that makes it much simpler for end customers, developers, and independent software

vendors (ISVs) to design almost any type of cloud application for a wide range of uses.

The Force.com platform offers global infrastructure and services for database, logic,

workflow, integration, user interface, and application exchange. Visualforce is

essentially a framework for creating new interface designs and enables user interactions

that can be built and delivered with no software or hardware infrastructure

requirements.

PaaS

Force.com delivers PaaS, a way to create and deploy business apps that allows

companies and developers to focus on what their applications do, rather than the

software and infrastructure to run them. The Force.com platform can run multiple

applications within the same Salesforce.com instance, allowing all of a company’s

Salesforce.com applications to share a common security model, data model, and user

interface. This is a major benefit found in cloud computing solutions. Add to that an on-

demand operating system, the ability to create any database on demand, a workflow

engine for managing collaboration between users, and a programming language for

building complex logic. A web services API for programmatic access, mash-ups, and

integration with other applications and data is another key feature.

Visualforce

As part of the Force.com platform, Visualforce provides the ability to design application

user interfaces for practically any experience on any screen. Visualforce uses HTML,

AJAX, and Flex, for business applications. Visualforce provides a page-based model,

built on standard HTML and web presentation technologies, and is complemented with

both a component library for implementing common user interface elements, and a

controller model for creating new interactions between those elements. Visualforce

features and capabilities include

• Pages Enables the design definition of an application’s user interface.

• Components Provides the ability to create new applications that automatically

match the look and feel of Salesforce.com applications or easily customize and extend

the Salesforce.com user interface to specific requirements.

• Logic Controllers The controller enables customers to build any user interface

behavior.

Salesforce.com CRM

Salesforce.com is a leader in cloud computing customer relationship management

(CRM) applications. Its CRM offering consists of the Sales Cloud and the Service Cloud

and can be broken down into five core applications:

• Sales Easily the most popular cloud computing sales application, Salesforce.com says

that CRM Sales is used by more than 1.1 million customers around the world. Its claim

to fame is that it is comprehensive and easy to customize. Its value proposition is that it

empowers companies to manage people and processes more effectively, so reps can

spend more time selling and less time on administrative tasks.

• Marketing With Salesforce.com CRM Marketing, marketers can put the latest web

technologies to work building pipeline while collaborating seamlessly with their sales

organization. The application empowers customers to manage multichannel campaigns

and provide up-to-date messaging to sales. And since the application is integrated with

the Salesforce.com CRM Sales application, the handoff of leads is automated

. • Service The Service Cloud is the new platform for customer service. Companies can

tap into the power of customer conversations no matter where they take place. Because

it’s on the Web, the Service Cloud allows companies to instantly connect to collaborate

in real time, share sales information, and follow joint processes. Connecting with

partners is made to be as easy as connecting with people on LinkedIn: companies

instantly share leads, opportunities, accounts, contacts, and tasks with their partners.

• Collaboration Salesforce.com CRM can help an organization work more efficiently

with customers, partners, and employees by allowing them to collaborate among

themselves in the cloud. Some of the capabilities include

• Create and share content in real time using Google Apps and Salesforce.com

• Track and deliver presentations using Content Library

• Give your community a voice using Ideas and Facebook

• Tap into the collective wisdom of the sales team with Genius

• Analytics Force.com offers real-time reporting, calculations, and dashboards so a

business is better able to optimize performance, decision making, and resource

allocation.

• Custom Applications Custom applications can be quickly created by leveraging one

data model, one sharing model, and one user interface.

MapReduce

What does MapReduce mean?

MapReduce is a programming model introduced by Google for processing and

generating large data sets on clusters of computers. Google first formulated the

framework for the purpose of serving Google’s Web page indexing, and the new

framework replaced earlier indexing algorithms. Beginner developers find the

MapReduce framework beneficial because library routines can be used to create parallel

programs without any worries about infra-cluster communication, task monitoring or

failure handling processes.

How MapReduce Works? The MapReduce algorithm contains two important tasks, namely Map and

Reduce.

• The Map task takes a set of data and converts it into another set of data, where

individual elements are broken down into tuples (key-value pairs).

• The Reduce task takes the output from the Map as an input and combines those

data tuples (key-value pairs) into a smaller set of tuples.

The reduce task is always performed after the map job.

• Input Phase − Here we have a Record Reader that translates each record in an input

file and sends the parsed data to the mapper in the form of key-value pairs.

• Map − Map is a user-defined function, which takes a series of key-value pairs and

processes each one of them to generate zero or more key-value pairs.

• Intermediate Keys − They key-value pairs generated by the mapper are known as

intermediate keys.

• Combiner − A combiner is a type of local Reducer that groups similar data from the

map phase into identifiable sets. It takes the intermediate keys from the mapper as

input and applies a user-defined code to aggregate the values in a small scope of one

mapper. It is not a part of the main MapReduce algorithm; it is optional.

• Shuffle and Sort − The Reducer task starts with the Shuffle and Sort step. It

downloads the grouped key-value pairs onto the local machine, where the Reducer is

running. The individual key-value pairs are sorted by key into a larger data list. The data

list groups the equivalent keys together so that their values can be iterated easily in the

Reducer task.

• Reducer − The Reducer takes the grouped key-value paired data as input and runs a

Reducer function on each one of them. Here, the data can be aggregated, filtered, and

combined in a number of ways, and it requires a wide range of processing. Once the

execution is over, it gives zero or more key-value pairs to the final step.

• Output Phase − In the output phase, we have an output formatter that translates the

final key-value pairs from the Reducer function and writes them onto a file using a

record writer.

Each day, numerous MapReduce programs and MapReduce jobs are executed on Google's

clusters. Programs are automatically parallelized and executed on a large cluster of commodity

machines. The runtime system deals with partitioning the input data, scheduling the program's

execution across a set of machines, machine failure handling and managing required

intermachine communication. Programmers without any experience with parallel and distributed

systems can easily use the resources of a large distributed system.

GOOGLE FILE SYSTEM(GFS)

Google File System (GFS) is a scalable distributed file system (DFS) created by Google Inc. and

developed to accommodate Google’s expanding data processing requirements. GFS provides

fault tolerance, reliability, scalability, availability and performance to large networks and

connected nodes. GFS is made up of several storage systems built from low-cost commodity

hardware components. It is optimized to accomodate Google's different data use and storage

needs, such as its search engine, which generates huge amounts of data that must be stored.

The Google File System capitalized on the strength of off-the-shelf servers while minimizing

hardware weaknesses.GFS is also known as GoogleFS.

The GFS node cluster is a single master with multiple chunk servers that are continuously

accessed by different client systems. Chunk servers store data as Linux files on local disks.

Stored data is divided into large chunks (64 MB), which are replicated in the network a minimum

of three times. The large chunk size reduces network overhead.

GFS is designed to accommodate Google’s large cluster requirements without burdening

applications. Files are stored in hierarchical directories identified by path names. Metadata - such

as namespace, access control data, and mapping information - is controlled by the master, which

interacts with and monitors the status updates of each chunk server through timed heartbeat

messages.

GFS features include:

• Fault tolerance

• Critical data replication

• Automatic and efficient data recovery

• High aggregate throughput

• Reduced client and master interaction because of large chunk server size

• Namespace management and locking

• High availability

The largest GFS clusters have more than 1,000 nodes with 300 TB disk storage capacity. This

can be accessed by hundreds of clients on a continuous basis.

General architecture of Google File System GFS is clusters of computers. A cluster is simply a network of computers. Each cluster might

contain hundreds or even thousands of machines. In each GFS clusters there are three

main entities:

1. Clients

2. Master servers

3. Chunk servers.

Client can be other computers or computer applications and make a file request. Requests can range from retrieving and manipulating existing files to creating new files on the system. Clients can be thought as customers of the GFS.

Master Server is the coordinator for the cluster. Its task include:-

1. Maintaining an operation log, that keeps track of the activities of the cluster. The operation log helps keep service interruptions to a minimum if the master server crashes, a replacement server that has monitored the operation log can take its place.

2. The master server also keeps track of metadata, which is the information that describes chunks. The metadata tells the master server to which files the chunks belong and where they fit within the overall file.

Chunk Servers are the workhorses of the GFS. They store 64-MB file chunks. The chunk servers don't send chunks to the master server. Instead, they send requested chunks directly to the client. The GFS copies every chunk multiple times and stores it on different chunk servers. Each copy is called a replica. By default, the GFS makes three replicas per chunk, but users can change the setting and make more or fewer replicas if desired.

Management done to overloading single master in Google File System

Having a single master enables the master to make sophisticated chunk placement and replication decisions using global knowledge. However, the involvement of master in reads and writes must be minimized so that it does not become a bottleneck. Clients never read and write file data through the master. Instead, a client asks the master which chunk servers it should contact. It caches this information for a limited time and interacts with the chunk servers directly for many subsequent operations.

General scenario of client request handling by GFS

File requests follow a standard work flow. A read request is simple; the client sends a request to the master server to find out where the client can find a particular file on the system. The server responds with the location for the primary replica of the respective chunk. The primary replica holds a lease from the master server for the chunk in question.

If no replica currently holds a lease, the master server designates a chunk as the primary. It does this by comparing the IP address of the client to the addresses of the chunk servers containing the replicas. The master server chooses the chunk server closest to the client. That chunk server's chunk becomes the primary. The client then contacts the appropriate chunk server directly, which sends the replica to the client.

Write requests are a little more complicated. The client still sends a request to the master server, which replies with the location of the primary and secondary replicas. The client stores this information in a memory cache. That way, if the client needs to refer to the same

replica later on, it can bypass the master server. If the primary replica becomes unavailable or the replica changes then the client will have to consult the master server again before contacting a chunk server.

The client then sends the write data to all the replicas, starting with the closest replica and ending with the furthest one. It doesn't matter if the closest replica is a primary or secondary. Google compares this data delivery method to a pipeline.

Once the replicas receive the data, the primary replica begins to assign consecutive serial numbers to each change to the file. Changes are called mutations. The serial numbers instruct the replicas on how to order each mutation. The primary then applies the mutations in sequential order to its own data. Then it sends a write request to the secondary replicas, which follow the same application process. If everything works as it should, all the replicas across the cluster incorporate the new data. The secondary replicas report back to the primary once the application process is over.

At that time, the primary replica reports back to the client. If the process was successful, it ends here. If not, the primary replica tells the client what happened. For example, if one secondary replica failed to update with a particular mutation, the primary replica notifies the client and retries the mutation application several more times. If the secondary replica doesn't update correctly, the primary replica tells the secondary replica to start over from the beginning of the write process. If that doesn't work, the master server will identify the affected replica as garbage.

Advantages and disadvantages of large sized chunks in Google File System

Chunks size is one of the key design parameters. In GFS it is 64 MB, which is much larger than typical file system blocks sizes. Each chunk replica is stored as a plain Linux file on a chunk server and is extended only as needed.

Advantages

1. It reduces clients’ need to interact with the master because reads and writes on the same chunk require only one initial request to the master for chunk location information.

2. Since on a large chunk, a client is more likely to perform many operations on a given chunk, it can reduce network overhead by keeping a persistent TCP connection to the chunk server over an extended period of time.

3. It reduces the size of the metadata stored on the master. This allows us to keep the metadata in memory, which in turn brings other advantages.

Disadvantages

1. Lazy space allocation avoids wasting space due to internal fragmentation.

2. Even with lazy space allocation, a small file consists of a small number of chunks, perhaps just one. The chunk servers storing those chunks may become hot spots if many clients are accessing the same file. In practice, hot spots have not been a major issue because the applications mostly read large multi-chunk files sequentially. To mitigate it, replication and allowance to read from other clients can be done.

HDFS( Hadoop Distributed File System) HDFS is a distributed file system allowing multiple files to be stored and retrieved at the same time at an unprecedented speed. It is one of the basic components of Hadoop framework. Hadoop File System was developed using distributed file system design. It is run on commodity hardware. Unlike other distributed systems, HDFS is highly fault tolerant and designed using low-cost hardware. HDFS holds very large amount of data and provides easier access. To store such huge data, the files are stored across multiple machines. These files are stored in redundant

fashion to rescue the system from possible data losses in case of failure. HDFS also makes applications available to parallel processing.

HDFS is a key part of the many Hadoop ecosystem technologies, as it provides a

reliable means for managing pools of big data and supporting related big data

analytics applications.

How HDFS works

HDFS supports the rapid transfer of data between compute nodes. At its outset, it

was closely coupled with MapReduce, a programmatic framework for data

processing.

When HDFS takes in data, it breaks the information down into separate blocks and

distributes them to different nodes in a cluster, thus enabling highly efficient parallel

processing.

Moreover, the Hadoop Distributed File System is specially designed to be

highly fault-tolerant. The file system replicates, or copies, each piece of data multiple

times and distributes the copies to individual nodes, placing at least one copy on a

different server rack than the others. As a result, the data on nodes that crash can

be found elsewhere within a cluster. This ensures that processing can continue

while data is recovered.

HDFS uses master/slave architecture. In its initial incarnation, each Hadoop

clusterconsisted of a single NameNode that managed file system operations and

supporting DataNodes that managed data storage on individual compute nodes. The

HDFS elements combine to support applications with large data sets.

http://searchcloudcomputing.techtarget.com/definition/big-data-Big-Data

http://searchcloudcomputing.techtarget.com/definition/big-data-Big-Data

http://searchbusinessanalytics.techtarget.com/definition/big-data-analytics




http://searchcloudcomputing.techtarget.com/definition/MapReduce

http://searchcloudcomputing.techtarget.com/definition/MapReduce

http://whatis.techtarget.com/definition/parallel-processing-software




http://searchcio-midmarket.techtarget.com/definition/fault-tolerant

http://searchcio-midmarket.techtarget.com/definition/fault-tolerant

http://whatis.techtarget.com/definition/rack-server-rack-mounted-server

http://whatis.techtarget.com/definition/rack-server-rack-mounted-server

http://searchnetworking.techtarget.com/definition/master-slave

http://searchnetworking.techtarget.com/definition/master-slave

http://searchbusinessanalytics.techtarget.com/definition/Hadoop-cluster




This master node "data chunking" architecture takes as its design guides elements

from Google File System (GFS), a proprietary file system outlined in in Google

technical papers, as well as IBM's General Parallel File System (GPFS), a format

that boosts I/O by striping blocks of data over multiple disks, writing blocks in

parallel. While HDFS is not Portable Operating System Interface model-compliant, it

echoes POSIX design style in some aspects

Features of HDFS

• It is suitable for the distributed storage and processing.

• Hadoop provides a command interface to interact with HDFS.

• The built-in servers of namenode and datanode help users to easily check the status of

cluster.

• Streaming access to file system data.

• HDFS provides file permissions and authentication.

HDFS Architecture Given below is the architecture of a Hadoop File System.

HDFS follows the master-slave architecture and it has the following elements.

Namenode

The namenode is the commodity hardware that contains the GNU/Linux operating

system and the namenode software. It is a software that can be run on commodity

hardware. The system having the namenode acts as the master server and it does

the following tasks −

• Manages the file system namespace.

http://searchstorage.techtarget.com/definition/IBM-General-Parallel-File-System-IBM-GPFS

http://searchstorage.techtarget.com/definition/IBM-General-Parallel-File-System-IBM-GPFS

http://whatis.techtarget.com/definition/POSIX-Portable-Operating-System-Interface

http://whatis.techtarget.com/definition/POSIX-Portable-Operating-System-Interface

• Regulates client’s access to files.

• It also executes file system operations such as renaming, closing, and opening files and

directories.

Datanode

The datanode is a commodity hardware having the GNU/Linux operating system

and datanode software. For every node (Commodity hardware/System) in a

cluster, there will be a datanode. These nodes manage the data storage of their

system.

• Datanodes perform read-write operations on the file systems, as per client request.

• They also perform operations such as block creation, deletion, and replication according

to the instructions of the namenode.

Block

Generally the user data is stored in the files of HDFS. The file in a file system will be

divided into one or more segments and/or stored in individual data nodes. These file

segments are called as blocks. In other words, the minimum amount of data that HDFS can

read or write is called a Block. The default block size is 64MB, but it can be increased as

per the need to change in HDFS configuration.

Goals of HDFS Fault detection and recovery − Since HDFS includes a large number of

commodity hardware, failure of components is frequent. Therefore HDFS should

have mechanisms for quick and automatic fault detection and recovery.

Huge datasets − HDFS should have hundreds of nodes per cluster to manage the

applications having huge datasets.

Hardware at data − A requested task can be done efficiently, when the

computation takes place near the data. Especially where huge datasets are

involved, it reduces the network traffic and increases the throughput.

Hadoop Framework Hadoop is an Apache Software Foundation project that process large volume of data. It is a

Big Data technology to store and process the really huge amount of data by distributing the

data to different nodes.

Hadoop is an Apache open source framework written in java that allows distributed

processing of large datasets across clusters of computers using simple programming

models. The Hadoop framework application works in an environment that provides

distributed storage and computation across clusters of computers. Hadoop is designed to

scale up from single server to thousands of machines, each offering local computation and

storage.

Hadoop is an open source distributed processing framework that manages data processing

and storage for big data applications running in clustered systems. It is at the center of a

growing ecosystem of big data technologies that are primarily used to support advanced

analytics initiatives, including predictive analytics, data mining and machine

learningapplications. Hadoop can handle various forms of structured and unstructured data,

giving users more flexibility for collecting, processing and analyzing data than relational

databases and data warehouses provide.

Hadoop and big data Hadoop runs on clusters of commodity servers and can scale up to support thousands of

hardware nodes and massive amounts of data. It uses a namesake distributed file

system that's designed to provide rapid data access across the nodes in a cluster, plus fault-

tolerant capabilities so applications can continue to run if individual nodes fail.

Consequently, Hadoop became a foundational data management platform for big

data analytics uses after it emerged in the mid-2000s.

Hadoop was created by computer scientists Doug Cutting and Mike Cafarella, initially to

support processing in the Nutch open source search engine and web crawler. After Google

published technical papers detailing its Google File System (GFS)

and MapReduceprogramming framework in 2003 and 2004, respectively, Cutting and

Cafarella modified earlier technology plans and developed a Java-based MapReduce

implementation and a file system modeled on Google's.

In early 2006, those elements were split off from Nutch and became a separate Apache

subproject, which Cutting named Hadoop after his son's stuffed elephant. At the same time,

Cutting was hired by internet services company Yahoo, which became the first production

user of Hadoop later in 2006. (Cafarella, then a graduate student, went on to become a

university professor.)

Use of the framework grew over the next few years, and three independent Hadoop vendors

were founded: Cloudera in 2008, MapR a year later and Hortonworks as a Yahoo spinoff in

2011. In addition, AWS launched a Hadoop cloud service called Elastic MapReduce in 2009.

That was all before Apache released Hadoop 1.0.0, which became available in December

2011 after a succession of 0.x releases.

Hadoop Architecture At its core, Hadoop has two major layers namely −

• Processing/Computation layer (MapReduce), and

• Storage layer (Hadoop Distributed File System).

https://searchcloudcomputing.techtarget.com/definition/big-data-Big-Data

https://searchcloudcomputing.techtarget.com/definition/big-data-Big-Data

https://searchbusinessanalytics.techtarget.com/definition/predictive-analytics

https://searchbusinessanalytics.techtarget.com/definition/predictive-analytics

https://searchenterpriseai.techtarget.com/definition/machine-learning-ML




https://searchsqlserver.techtarget.com/definition/data-warehouse

https://searchsqlserver.techtarget.com/definition/data-warehouse

https://searchnetworking.techtarget.com/definition/node

https://searchnetworking.techtarget.com/definition/node

https://searchbusinessanalytics.techtarget.com/definition/big-data-analytics




https://searchcloudcomputing.techtarget.com/definition/MapReduce

https://searchcloudcomputing.techtarget.com/definition/MapReduce

https://searchaws.techtarget.com/definition/Amazon-Elastic-MapReduce-Amazon-EMR

https://searchaws.techtarget.com/definition/Amazon-Elastic-MapReduce-Amazon-EMR

Software-as-a-Service (SaaS) · underlying technologies that support web services and...

Documents

Transcript of Software-as-a-Service (SaaS) · underlying technologies that support web services and...