Service Assurance for the Virtualizing and Software ... · full value of the OSS dataset !...
Transcript of Service Assurance for the Virtualizing and Software ... · full value of the OSS dataset !...
Product Management, Cloud and Virtualization
October 7, 2015
Service Assurance for the Virtualizing and Software-Defined Networks Cisco Knowledge Network Presentation
Paola Arosio Deepak Bhargava
2 © 2014 Cisco and/or its affiliates. All rights reserved.
Agenda
NFV Adoption Trends
Required Attributes
Current OSS Limitations and Challenges
The New Approach to Service Assurance
Summary, Contact and Resources
3 © 2014 Cisco and/or its affiliates. All rights reserved.
NFV Adoption Trends
Source: Heavy Reading “NFV and Service Assurance Report”, June 2015
4 © 2014 Cisco and/or its affiliates. All rights reserved.
Current OSS Limitations and Challenges
5 © 2014 Cisco and/or its affiliates. All rights reserved.
OSS Limitations and Challenges for Service Assurance
§ Yet only 2% of users with a bad experience complain
§ 73% of problems are reported by end users
73%
27%
98%
2%
By inference: § Existing systems detect <1% of performance issues § ~97% of issues are neither detected nor reported
Customer Reported
Issues
Provider detected issues
Customers who don’t complain
Customers who complain
Source: Forrester Study Source: Gartner “how to approach customer experience management”
6 © 2014 Cisco and/or its affiliates. All rights reserved.
NFV and SDN Introduce Further Complexities
Cloud, Elastic Compute, SDN and NFV Virtualized Networks and Compute
Transmission and Mobile
Application
Application NFV
NFV
Service
Service
“This stuff is complex. I don’t know if the fault is
me or them”
“We need to detect issues before the end
users call” “Everyone in Support is working in isolation of
each other”
Disassociated Apps & Infrastructure
DevOps
NfV
Apps
Disassociated Monitoring and Aggregation Tools
Alert Management / Log Management / APM / Performance Analytics
Disassociated People Silos
Owner Operated
Outsourcer Operated 3P Service
7 © 2014 Cisco and/or its affiliates. All rights reserved.
NFV and SDN Requires a New Approach to Service Assurance
CSP Business Requirements
Traditional Assurance Approach
Cisco’s Approach to Service Assurance (Aided by orchestration)
End-to-end service and customer experience focus
• Resource centric • Service state calculated as
an after-thought based on low-level KPIs
• Service model based SLA decomposition and cross-domain data aggregation
Rapid Service Creation • An after-thought • Coupling of orchestration and assurance • Service model-driven assurance
Visibility into dynamic Infrastructure
• Bottoms-up modeling • Rules-based approach
• Subscription to VI/VNF changes and dependency mapping • Self-healing and optimization enabled by feedback loop
Managing both virtual and physical resources
• Static model for dedicated resources
• Hybrid analytics & model based approach for service impact and root cause analysis
Seamless integration with OSS/BSS current environment
• Segmented & specialized tools and operations
• Modular and layered architecture delivering deployment flexibility • Open horizontally scalable data platform • Decoupling of publishing from consumption layer - break silo’s without
disrupting operations
8 © 2014 Cisco and/or its affiliates. All rights reserved.
Poll Question- What are your top three requirements for next-generation service assurance? (choose top 3)
9 © 2014 Cisco and/or its affiliates. All rights reserved.
A New Approach to Service Assurance
10 © 2014 Cisco and/or its affiliates. All rights reserved.
• Self Healing: Policy-based automation that combine visibility and analytics to control and optimize
• Out-of-Box Content: Pre-defined content for supporting use-cases
• External Integration: Based on open APIs
Service Assurance: Key Tenets
x86
L2/L3 CPE (ISR, NID)
Hosted DC
vWSA vFW
vWAAS
Branch A
Branch B
VPN (IPSec or IPv6
L2TPv3
CPE Storage
Compute WAN NFV
CSR1kV NAT
• Cross-domain and Multi-vendor: End-to-end visibility across multiple domains (i.e. CPE to WAN to NFV and Cloud)
• Multi-layer: Correlated views across various layers - service, virtual, and physical
• Orchestration Integration: Provision service assurance at the time of service instantiation
Deliver reliable services and a consistent user experience
Example: Cloud VPN
11 © 2014 Cisco and/or its affiliates. All rights reserved.
Modular Architecture Breaks siloes and enable vertical specialization
Service Management Problem/Incident Management
Service Health Impact Analysis
Service Quality Assessment
Optimization
Service/Workload Placement
Events Logs Metrics
Network, Compute, Storage (Physical, Virtual) Operations Business
Domain Specific
Cross-Domain
Capacity Planning & Forecasting
Fault and Cause Analysis
Customer Portal Operator Portal Executive Dashboard
Col
lect
ion
Ana
lysi
s P
rese
ntat
ion
Service Health Dashboard SLA Dashboard Routing &
Reservation Console
Event Analysis
Distribution Bus
Metric Analysis
BS
S/O
SS
M
ediation
Billing Mediation
Legacy OSS GWs
Loosely coupled and tight integration with Service Orchestration
Analysis layer covering specific capabilities: • RCA, SIA, SLA • Modular
approach to allow analytics plug-ins
Modular and Layered architecture allows for: • Reuse of existing
collection mechanisms
• Integration with existing customer and third-party OSS applications
Horizontal Scalable Platform to collect data from different sources and publish data to consumers
Out-of-box content to support specific use cases.
Use common YANG service model for service and assurance descriptors definition
Log Analysis
Cro
ss-D
omai
n O
rche
stra
tor
Fulfi
llmen
t
VMS VPC GiLAN Other …
12 © 2014 Cisco and/or its affiliates. All rights reserved.
On a scale from 1 (not important) to 5 (critically important), please rank the importance of having the following service assurance analytics functions in place when FIRST deploying NFV.
Poll Question: Which Service Assurance functions are gating your NFV deployment?
13 © 2014 Cisco and/or its affiliates. All rights reserved.
1. Open and modular, aligned with Big Data framework
2. Service model-driven assurance
3. Analytics based OSS functions applied across physical and virtual infrastructure
Service Assurance Architecture Required Attributes
14 © 2014 Cisco and/or its affiliates. All rights reserved.
Open and Modular Architecture
15 © 2014 Cisco and/or its affiliates. All rights reserved.
Open and Modular Architecture Enables flexible deployment
Service Management Problem/Incident Management
Service Health Impact Analysis
Service Quality Assessment
Optimization
Service/Workload Placement
Events Logs Metrics
Network, Compute, Storage (Physical, Virtual) Operations Business
Domain Specific
Cross-Domain
Capacity Planning & Forecasting
Fault and Cause Analysis
Customer Portal Operator Portal Executive Dashboard
Col
lect
ion
Ana
lysi
s P
rese
ntat
ion
Service Health Dashboard SLA Dashboard Routing &
Reservation Console
Event Analysis
Distribution Bus
Metric Analysis
BS
S/O
SS
M
ediation
Billing Mediation
Legacy OSS GWs
Log Analysis
INSTRUMENTATION
NETWORK SERVICE MANAGEMENT
SERVICE ASSURANCE
Brownfield
Mix Brown-Green Field
Greenfield
16 © 2014 Cisco and/or its affiliates. All rights reserved.
§ Tight coupling of data aggregation/store/ analysis – pipelines realised in products
§ Multi-stage processing- both at aggregation and analysis
§ Architecture is a function of product choices
§ Filtering on aggregation to reduce data volume
§ Analysis functions largely based on programmatic rules, derived topology models and static thresholds
How Service Assurance is Realized Today
SNMP stats
Logs
SNMP traps
Data sources
Events
Data store
Perf Analysis
Fault Analysis
Data analysis
Stats
Log Index/Search Logs
Polling
Event aggregation
Data aggregation
Log Aggregation
Dashboard & Reporting
Dashboard & Reporting
Outputs
Dashboard & Reporting
17 © 2014 Cisco and/or its affiliates. All rights reserved.
OSS functions can be expressed as operations against the entire OSS data set: § Fault management = ƒ(event data, metric data)
§ Performance management = ƒ(metric data)
§ Billing mediation = ƒ(event data, metric data)
§ Capacity management = ƒ(metric data)
Analytics in the Loop(s): Self-healing and optimization enabled by feedback loop
Big Data based Service Assurance Analytics enables self-healing and optimization
18 © 2014 Cisco and/or its affiliates. All rights reserved.
Big Data Architecture
Consumers: Data analysis Applications
Performance Analy.cs
Fault Analysis
SLA Repor.ng
Incident & Problem Management
Log Search
Capacity Analy.cs
Billing (Media.on)
Business Intelligence
Security and Threat Analysis
Orchestration
Controllers
Customer
Devices
Applications
QoE Monitoring
Infrastructure and
service-‐level data
Custom
er-‐le
vel
data
Data Distribution
Data Store & Processing
Master Data Store
Open Data Platform
Batch Processing
Stream processing
Live stream
Real Time Data Store
Dee
p H
isto
rical
Que
ry
Rea
l Tim
e Q
uery
Publishers: Data aggregation
Event aggregation
Log Aggregation
Metric aggregation
Network Telemetry
Benefits § An open system architecture with no
dependency to any specific vendor or product
§ Allows any analytics application to mine any data source, leveraging the full value of the OSS dataset
§ Extensible – add new OSS analytics functions quickly and seamlessly with minimum development cost
§ Minimizes duplicate polling – collect data once, use many times
§ Remove cross-system integration
§ Leverage rapid innovation in Big Data analytics space
§ Platform is extensible beyond OSS
19 © 2014 Cisco and/or its affiliates. All rights reserved.
Service Assurance Functional Baseline
Metric Data
Event data
Real-time inventory
Performance Analytics
SLA Reporting
Incident & Problem
Management
Service Health
Real-Time Service Health
Time-series analytics
Event analytics
Views
Incident UI
Service Status Dashboard
Event Console
Orchestration
Controllers
Customer
Devices
Applications
QoE Monitoring
Infrastructure and
service-‐level data
Custom
er-‐le
vel
data
Data P
latform
Fault Analytics
20 © 2014 Cisco and/or its affiliates. All rights reserved.
Service Model Driven Assurance
21 © 2014 Cisco and/or its affiliates. All rights reserved.
Service Level Definition • Service availability • Loss, latency, jitter, …
Service Assurance • Verify the service is available and
how it is performing • Scale-up/-down based upon load • Local recovery actions if the VNF
is unavailable/underperforming • Identify underlying causes and fix
them asap Service Management &
Operations
Orchestration • Put it there Service Provisioning
Service Assurance is a Service Lifecycle Problem
Service Placement
SLA Definition • What SLA is required?
Service Monitoring Service Elasticity and Availability
Service Placement • Where can it be supported?
• Service Availability • Monitoring • Reporting
• Admission Control • Workload Placement
• Service Elasticity and Availability
• Performance Mgmt • Service Level Monitoring • Fault management {cause
analysis, Impact analysis} • Incident / problem mgmt. • Remediation
Day-0: Before service provisioning
Day-1: During service provisioning
Day-2: After service provisioning
22 © 2014 Cisco and/or its affiliates. All rights reserved.
SLA Definition Services are different…
§ No generic KPI measurement component for all service types
§ Smart components (i.e probes) per service-type often needed
§ Focus on Service-Type KPI definition and direct Service KPI monitoring
SLA Status • Violated • Jeopardized
Generic KPIs • Latency • Availability, Uptime • …
Service-type specific KPIs
Service-type specific KPIs
Service-type specific KPIs
Service-type specific KPIs
f()
Service Level Definition
e.g. MOS, App Response Time, MDI, vMOS
VoIP
Video on Demand
Video Streaming
Exchange
Sharepoint
WiFI
Cloud VPN
iWAN
IaaS
23 © 2014 Cisco and/or its affiliates. All rights reserved.
End to End Service Assurance
WAN
WAN
Managed CPEs Virtualized Data Center Infrastructure
Virtual Network FuncEons
WAN Service
IaaS #1
CPE Service VNF Service #1
End to End Service
IaaS #3
VNF #3
e2e Service
VNF Service IAAS Service WAN Service CPE Service
IaaS #2
VNF Service #2 VNF #4
SLA attributes (examples): Availability: 99.99% Response Time: < 100ms Throughput: > 1 Gbps
Network/Compute/Storage
Dashboard/Reports
Service, SLA & Policy Definition
Monitoring Policy Definition
Analytics (………..)
Cross-Domain, Multi-Vendor
Collection
Visibility-Policy-Control Feedback
Loop
Control/ Configuration changes
Orchestration Assurance
Domain Specific SLA
Decomposition (RFS)
Instrumentation Provisioning
Service Model Creation / Change
Feedback Loop
Service Provisioning
24 © 2014 Cisco and/or its affiliates. All rights reserved.
Service Assurance Auto Enablement Leverage YANG and Orchestration Engine for Service Assurance Provisioning
§ Extend service model to describe service intent and SLA descriptors
§ Leverage full power of YANG model to support the service assurance parameters
Dashboard/Reports
Service, SLA & Policy Definition
Monitoring Policy Definition
Analytics (………..)
Cross-Domain, Multi-Vendor
Collection
Instrumentation Provisioning
Domain Specific SLA
Decomposition (RFS)
Network/Compute/Storage
Service Model Extension
Configure collection systems
Auto-provisioning of instrumentation,
test capabilities and probes
Inventory Models to facilitate SIA
and RCA
Remediation feedback
Service Provisioning
Visibility-Policy-Control Feedback
Loop
25 © 2014 Cisco and/or its affiliates. All rights reserved.
Service Assurance Orchestration
NSO
VNFD
(V)NF VNFM Monitoring Systems Reporting
Defines scale and availability parameters for VNF which are managed locally by VNFM; determines VNF monitoring by VNFM
Defines VNF and associated SLA
Configures required instrumentation on that VNF, e.g. syslog, SNMP, etc.
Real-time inventory
Container {……. Leaf {…….
List {… } } }
Service Model
Service Provisioning
Configures reporting system so that it knows how to interpret monitoring data, e.g. roll-up calculations, alter thresholds etc.
Configures required monitoring on monitoring system for that VNF, e.g. what to poll for and when
Share Service Definition and infrastructure context for analytics
Activation Test: validates that the service works
26 © 2014 Cisco and/or its affiliates. All rights reserved.
Analytics based OSS Functions
27 © 2014 Cisco and/or its affiliates. All rights reserved.
Hybrid Analytics and Model-based approach Expedite Service Impact and Root Cause Analysis
WAN Service
IaaS #1
CPE Service VNF Service #1
IaaS #3
VNF #3
e2e Service
VNF Service IAAS Service WAN Service CPE Service
IaaS #2
VNF Service #2 VNF #4
Resource/Domain Manager
Resource/Domain Manager
Resource/Domain Manager
Model Based
Events Logs Metrics
Distribution Bus C
olle
ctio
n A
naly
sis
Cro
ss-D
omai
n O
rche
stra
tor
Service Model
Analytics Based
Dat
a E
nric
hmen
t
28 © 2014 Cisco and/or its affiliates. All rights reserved.
Events Filtered Alerts Situations Notifications Situation Rooms Network, Compute,
Apps, Sentiment
Real-time, Automatic Data Categorization and Noise Filtering
(Unsupervised Machine Learning)
Real-time, Automatic Anomaly Detection;
Alert Groups (Unsupervised Machine Learning)
Real-time, Automatic Situation Awareness;
Dynamic Teaming (Unsupervised Machine Learning)
Clean Contextualize Close
Analytics-based Fault and Situation Analysis Contextualized Alert Clusters reduces Time to Detect, Diagnose and Resolve
29 © 2014 Cisco and/or its affiliates. All rights reserved.
Analytics-based Fault and Situation Analysis Contextualized Alert Clusters reduces Time to Detect, Diagnose and Resolve
Events
Event Enrichment
Significance Ranking
Event Filtering Situations
Situation Enrichment
Situation Prioritization
Orchestration
Collaboration
Automation
Knowledge
Syslog Translation
SNMP Translation
Stakeholder Notification
NSO Ticketing System
Clean Contextualize Close
30 © 2014 Cisco and/or its affiliates. All rights reserved.
Analytics-based Fault and Situation Analysis Enhance Operational Efficiency with Collaboration and Knowledge base
Knowledge Base
Historical Algorithms Generated Similarity Factor Identifies Past
Similar Problems
User Comments Entered during troubleshooting and resolution of past
problems helps resolve current instance of recurring problem
31 © 2014 Cisco and/or its affiliates. All rights reserved.
Service Assurance: Attributes to effective operationalization of virtual infrastructures
An ideal solution must:
§ Leverage on YANG service models as a bridge between orchestration and assurance
§ Provide a horizontal and scalable big data platform & real-time data collection based on "publish/ subscribe" principles
§ Incorporate analytics functions across both physical and virtual elements
§ Enable self-healing through close looped feedback
Summary
32 © 2014 Cisco and/or its affiliates. All rights reserved.
Poll Question: Service Assurance in Hybrid Physical and Virtual Infrastructures
33 © 2014 Cisco and/or its affiliates. All rights reserved.
Contacts and Resources
34 © 2014 Cisco and/or its affiliates. All rights reserved.
Americas • Moti Beharav: [email protected]
EMEAR • Brett Holmes: [email protected] APJC • Andrew Eaton: [email protected]
Resources
For More Information
Cisco Contacts Heavy Reading White Paper The Role of Service Assurance in the Virtualizing Network
Cisco Evolved Services Platform www.cisco.com/go/esp
Cisco Service Management and Orchestration Software Portfolio www.cisco.com/go/servicemano
l