Defining Service Level Agreements

52
COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 1 Defining Defining Service Level Agreements Service Level Agreements Giorgio Ventre Giorgio Ventre The COMICS Research Group The COMICS Research Group @ The University of Napoli Federico II, The University of Napoli Federico II, Corso di Reti di Calcolatori

description

Corso di Reti di Calcolatori II. Defining Service Level Agreements. Giorgio Ventre The COMICS Research Group @ The University of Napoli Federico II,. Professional & Business Challenges. Excessive alarms: A medium-sized operations center receives 100,000 to 1,000,000 alarms per day. - PowerPoint PPT Presentation

Transcript of Defining Service Level Agreements

Page 1: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 1

DefiningDefiningService Level AgreementsService Level Agreements

Giorgio VentreGiorgio Ventre

The COMICS Research Group The COMICS Research Group @@

The University of Napoli Federico II,The University of Napoli Federico II,

Corso di Reti di Calcolatori II

Page 2: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 2

Professional & Business ChallengesProfessional & Business Challenges

Excessive alarms: A medium-sized operations center receives 100,000 to 1,000,000 alarms per day.

Constant changes: New or upgraded devices and new services launch frequently.

Complex services structure: Services are vital for business and customer interaction, but they are not really managed.

Page 3: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 3

Professional & Business ChallengesProfessional & Business Challenges

Customer interaction: Operators must handle customer complaints, customer care, selling services and service-level agreements (SLAs).

Cuts in operations costs: A small team must run a large, multifaceted network.

Difficult interface integration: Diverse equipment and support systems make managing interface integration a challenge.

Page 4: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 4

Professional & Business ChallengesProfessional & Business Challenges

The bulk of network administrators’ daily work involves alarms.

A large number of alarms indicates we have many irrelevant and noncorrelated signals

It’s therefore hard to understand the true state of problems in the network.

Today’s alarms are more or less raw warnings from the different equipment and vendor-specific management systems.

Page 5: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 5

Professional & Business ChallengesProfessional & Business Challenges

Operators establish an organization on multiple levels that handle the alarms.

The first line has three major tasks: check for alarms that indicate the same problem, group the alarms and attach them to a trouble ticket, distribute problem information to affected parties, such as

SLA customers and customer care.

Page 6: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 6

Professional & Business ChallengesProfessional & Business Challenges

If it’s a simple problem, the first line resolves it and closes the ticket.

If it’s a complex problem, the first line dispatches it to the second- and third-line organizations. This might involve: equipment vendors operator staff in the field who might perform onsite

management, card replacement, and so on

Page 7: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 7

Professional & Business ChallengesProfessional & Business Challenges

Automatic trouble ticketing manages the workflow from problem identification to problem solution

An alarm’s context determines if it affects services, customer SLAs, equipment’s state

We need Alarm-correlation but Alarm quality is insufficient We lack an overall network topology Correlation knowledge is spread across the organization and

over several domain experts.

Page 8: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 8

Professional & Business ChallengesProfessional & Business Challenges

Operators are looking for service management solutions and SLA management solutions

We need to solve several underlying problems: Topology management: network topology, service

topology,and the mapping between these. Service management: formal but dynamic management of

services, and SLAs Service centric integration and modeling

Page 9: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 9

Professional & Business ChallengesProfessional & Business Challenges

A basic tool is the Service Tree

But it is also critical to be able to map the important dependencies: We need correlation software!We need formalisms!We need Knowledge

Management!

Page 10: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 10

The concept of ServiceThe concept of Service

Supplier Customer

Value provided

(goods, services, …)

Page 11: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 11

New business modelNew business model

In traditional telephony, a customer was just a number (SAP) and a well defined service

Quality used to be controlled statistically Faults were rare and mostly at exchanges …the good old days of monopolists Today we have multiple services with different possible

configurations We now need to link customers to services

Page 12: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 12

DefinitionsDefinitions

Service = is the interface between supplier and customer: Service Access Point (SAP), it is where the service is

accessed (my home, your company’s sites…) Customer Service Management (CSM), it is how the

customer interacts with the supplier

Page 13: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 13

Service life-cycleService life-cycle

Design (definition, marketing) Negotiation (a business interaction) Provisioning (implementation & test) Usage

Operation Change

Deinstallation (end of supply)

Page 14: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 14

TMN TMN Telecommunication Telecommunication Management Network ModelManagement Network Model

Page 15: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 15

TMN TMN Telecommunication Telecommunication Management Network Model Management Network Model

Page 16: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 16

TOM: Telecom Operations MapTOM: Telecom Operations Map

Page 17: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 17

eTOMeTOM

Page 18: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 18

Service-related processesService-related processes

Service Creation Service Provisioning Service Activation Service Assurance Service Monitoring Service Accounting

Page 19: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 19

Business scenario at-a-glanceBusiness scenario at-a-glance

Service CreationService

Activation ServiceAccounting

Network planning and provisioning

Customer Care€41.20

Billing

Service

Assurance

Network

Inventory

Service

Provisioning

ServiceMonitoring

Page 20: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 20

Service Management integrationService Management integration

Integration with all main business processes: Order Management Service Creation (Marketing) Network Configuration Network Management Service Monitoring (SLA) Billing Customer Care

Page 21: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 21

A formal definitionA formal definition

• A Service Level Agreement is defined as a contract between the service provider and the customer that specifies the QoS level that can be expected for that service.

• It includes technical elements like the expected behavior of the service, the parameters for QoS verification, the devices involved…

• But it also includes legal components: costs, obligations, compensations…

Page 22: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 22

A formal definitionA formal definition

The Internet Engineering Task Force’s differentiated services (DiffServ) working group defines an SLA in networking parlance as follows: A SLA is a service contract between a customer and a service

provider that specifies the forwarding service a customer should receive (RFC 2475).

The SLA contains both technical and nontechnical terms and conditions. The technical specification of the transport service is given in service level specifications (SLSs). An SLS is a set of parameters and their values which together

define the service offered to a traffic stream by a DiffServ domain (RFC 3260)

Page 23: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 23

Research Issues on SLAResearch Issues on SLA

• SLA parameter definitions– This concerns the definition of service level parameters

such as availability, reliability, latency, and loss for SLA.– Ongoing work on common standards

• SLA Negotiation– Can we automatically stipulate & negotatiate SLAs?

• SLA measurement– This issue deals with how to accurately measure the QoS

that service providers deliver to their customers.– Quite a mature research area

Page 24: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 24

Research Issues on SLAResearch Issues on SLA

• SLA compliance reporting– This deals with mechanisms to satisfy increasingly sophisticated

customers who demand real-time reporting to confirm that they are receiving the service levels they were promised.

– Many things already done• QoS management

– This issue deals with how to manage and control the QoS delivered to customers to ensure compliance with established SLAs.

– Lot of work to be done• Languages for SLA definition and validation

– Can we define automatically and correctly Service Level Agreements?

Page 25: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 25

SLA Parameter DefinitionSLA Parameter Definition

• The QoS team of the TeleManagement Forum has been working on the automation of the interface between service providers and customers for performance reporting with the SLA concept.

• They have identified common terms and definitions, and have created an industry-wide glossary for performance measurement and reporting

• NMF, “Performance reporting definition document, NMF 701, June 1998

Page 26: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 26

SLA Parameter DefinitionSLA Parameter Definition

• The IP Performance Working Group of the Internet Engineering Task Force (IETF) has been working on the identification of Internet service metrics:

– Framework for IP Performance Metrics (RFC 2330)

– IPPM Metrics for Measuring Connectivity (RFC 2678)

– A One-Way Delay Metric for IPPM (RFC 2679)

– A One-Way Packet Loss Metric for IPPM (RFC 2680)

– A Round-Trip Delay Metric for IPPM(RFC 2681)

Page 27: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 27

Components of a SLAComponents of a SLA

A descritpion of the nature of service to be provided The expected performance level of the service (i.e.

reliability and responsiveness) The procedure for reporting problems The time frame for response and problem

resolution The process for monitoring and reporting the

service Consequences for non meeting expectations Escape clauses and constraints

Page 28: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 28

Describing a ServiceDescribing a Service

• This part of a SLA includes the type of service to be provided and any qualifications of the type of service to be provided.

• In the context of IP network connectivity, the type of service may specify the maintenance of network connectivity,

• It may include additional functions such as operation and maintenance of domain name servers, dynamic host configuration protocol servers, etc.

• It includes also info related to the location(s) where the service has to be provided

Page 29: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 29

Describing a ServiceDescribing a Service

For Internet services, the definition can be quite complex and can include several components:

• Application: security, access, configuration, upgrades, resource utilisation, response time…

• System: all info related to system health

• Network: everything related to the data transport, such as technologies, QoS, traffic conditioning and profiles, VPNs, encryption…

Page 30: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 30

Describing performance levelsDescribing performance levels

The Quality of Service (QoS) parameters for the Communication Network specifies the minimum requirements for:• Network accessibility

• Network availability

• Network performance (capacity, delay etc.)

• Network operation and maintenance

Page 31: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 31

Describing performance levelsDescribing performance levels

One simple thing: to agree on the parameters to measure…

Shall we go for ITU parameters or for IETF stuff?

ITU: world basically made of SONET and optical lines, bits or cells

IETF: world basically made of packets

Page 32: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 32

• To estimate and verify the quality of the various components in the network a number of measurement are specified in international agreed standards.

• The ITU Recommendations G.821 and G.826 specify a set of communication line parameters for SDH networks, primarily based on Bit Error Rates and derived numbers.

• The values will be part of the SLA between the end user and the network service provider.

Describing performance levelsDescribing performance levels

Page 33: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 33

• The recommendation G.821 has the following definitions:

• Errored second (ES), a one-second time interval in which one or more bit errors occurs.

• Severely Errored second (SES), a one-second time interval in which the bit error rate exceeds 10-3.

• Unavailable second (US), a circuit is considered to be unavailable from the first of at least 10 consecutive SES. The circuit is available from the first of at least 10 consecutive seconds which are not SES.

• Degraded minute (DM), a one-minute time interval in which the bit error rate exceeds 10-6.

• Error free seconds (EFS), a one-second time interval without any bit errors.

• In recommendation G.821 similar definitions are specified based on the block level.

Describing performance levelsDescribing performance levels

Page 34: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 34

• Recommendation G.826 has the following definitions:

• Errored second (ES), a one-second time interval containing one or more errored blocks.

• Errored block (EB), a block containing one or more errored bits

• Severely Errored second (SES), a one-second time interval in which more than 30% of the blocks are errored.

• Unavailable second (US), as for G.821

• Background block error (BBE), an error block that is not a SES

• A measurement time interval has to be specified, and the derived ratios for ES, SES and BER are the base for the QoS parameters.

• The recommended measurement time for G.821 and G.826 is 30 days.

Describing performance levelsDescribing performance levels

Page 35: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 35

• Recommendation I.356 has the following definitions:

• Cell Loss Ratio the number of cells lost divided by the number of cells transmitted.

• Cell Error Ratio (CER), the number of errored cells divided by the number of cells transmitted.

• Cell Misinsertion Rate (CMR) the number of wrongly inserted cells in a specified time interval.

• Cell Transfer Delay (CTD) the time from a cell enters a device under test to it leaves the device.

• Mean Cell Transfer Delay (CTD) is the arithmetical mean of a number of CTD values in a specified period.

• Cell Delay Variation (CDV) is the degree of variation in the cell transfer delay (CTD) of a virtual connection.

Describing performance levelsDescribing performance levels

Page 36: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 36

Describing performance levelsDescribing performance levels

For IP based services, performance levels migh be related to a packet-based, routed world:Average delay measured monthly across ISP network

between any two access routers should be less than 200ms

The average delay across the ISP network on the transcontinental link between the New York City access router of the customer and the London, U.K., access router of the customer would be less than 250 ms.

Page 37: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 37

Describing performance levelsDescribing performance levels

For IP based services, it is sometime specified also the method of measurement:The customer will not have unscheduled connectivity

disruption across the ISP network between any two access routers exceeding 5 min. Connectivity disruption would be defined as the loss of 100% of packets as measured by pinging an access router from a machine connected to another access router.

Page 38: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 38

Describing performance levelsDescribing performance levels

Page 39: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 39

Network availability for at least:

Availability of 99.6% to more than 90% of clients

Availability of 99% to more than 96.5%of clients

Availability of 97% to more than 98.5% of clients

Availability of 93% to more than 99.5% of clients

Mean time between failures of the service of at least:

1000 hours provided to 99% of clients

The target rate is less than 0.001 incidents per hour, calculated each month by dividing the number of failures in the best 99% access points by the number of access points and the number of hours in the month.

Example: JANET, UKExample: JANET, UK

Page 40: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 40

Example: JANET, UKExample: JANET, UK

Page 41: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 41

Example: JANET, UKExample: JANET, UK

Page 42: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 42

Example: JANET, UKExample: JANET, UK

Page 43: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 43

Example: JANET, UKExample: JANET, UK

Page 44: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 44

End-to-end latency between any pair of clients for

128 octet packets, measured as the time of entry on to the first access line of the last bit of the packet to the time of exit from the second access line of the first bit of the packet, of less than a stated target time, which depends on the transmission technology used for 95% of transmissions over any thirty minute period.

Clients shall normally expect to be able to

transmit and receive traffic (from a number of sources) which, over any thirty minute period, uses at least 40% of the nominal capacity of their access line, once the overheads of the data solely concerned with the transmission technology in use have been discounted

Example: JANET, UKExample: JANET, UK

Page 45: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 45

Performance Indicators and Service Levels for Domain Name Service :

Availability of the primary name server for the target domain of 99.5%

Availability of service from an available officially supported name server of 99.95%.

Performance Indicators and Service Levels for NTP Time Service:

This service is intended for use by access points in constructing their own distributed time services (RFC 1305).

Availability of each time reference of 98%,

MTBF of 800 hours.

Example: JANET, UKExample: JANET, UK

Page 46: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 46

Describing performance levelsDescribing performance levels

For hosting services we can have several types of contracted facilities:Boxes in specific locations (PoPs, IXPs): boxes

availability is a task of the customer (local access)Boxes + uptime time (i.e. maintenance): boxes are

maintained up. All the software inside is a customer responsibility (remote access)

Boxes + uptime + applications: the customer needs just to access the service (remote usage)

Examples are Web-hosting, CDN, Data Centres

Page 47: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 47

Describing performance levelsDescribing performance levels

Typical performance and availability clauses:The hosted server will not be unavailable for a

contiguous period exceeding 5 min in any 24-h period. Unavailability is defined as the ability to ping the server from a machine with network connectivity to the hosting provider’s access router.

The hosted server will be able to handle inbound traffic of 30 000 Web requests per day.

Page 48: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 48

Describing performance levelsDescribing performance levels

Typical performance and availability clauses: The hosted application will be provided access to the Internet at

a bandwidth of 45 Mb/s or more. The service provider will ensure that there are at least five

servers available and running the application at all times.

If we host multiple customers at the same site we are responsible for ensuring that the performance of one customer’s server is not adversely affected by requests directed to other customers. See AKAMAI, IBM “Business on Demand”, … Resource control? Virtualisation?

Page 49: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 49

Describing performance levelsDescribing performance levels

Today trend is to outsource entire IT services or even departments

In this case we have an integrated services offer and the performance clauses are at the overall system level: The time to perform an employee lookup on the corporate

directory would not exceed 500 ms. The average performance of a standard syntheticWebbased

transaction, as reported by probes located at selected locations, will not exceed 100 ms.

Unscheduled downtime of the mail server will not exceed a 30-min period during the normal business day of 9 A.M. to 5 P.M.

Page 50: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 50

Describing Customer SupportDescribing Customer Support

This section includes the typical helpdesk problem of reporting and problem resolution guarantees.

Examples include a single point of contact assigned to the customer and problem resolution within 48 hours of reporting.

Sometime it indicates also the SAP (e.g. a toll-free number or a web service).

Page 51: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 51

SLA MeasurementSLA Measurement

SLA measurement is a quite active area of R&D We can have different approaches depending on the

component we need to monitor: Sampling: very effective and reliable for applications and

system level parameters Trap/alarms: logging all troubles and faults

Big issues for network parameters: how, when & where

Page 52: Defining Service Level Agreements

COMICS (COMputer for Interaction and CommunicationS) Research Group – DIS, University of Napoli Federico II 52

SLA MeasurementSLA Measurement