OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study,...
Transcript of OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study,...
1
OPNFV Summit 2015
Doctor - Fault Management
Gerald Kunzmann, DOCOMO
Carlos Goncalves, NEC
Ryota Mibu, NEC
2
Doctor Overview
• Goal
– Build fault management and maintenance framework
• Approach
– Identify requirement
– Gap Analysis
– Implementation work in Upstream (OpenStack)
– Integration and testing
• Status
– Initial Requirement study, architecture design, Gap analysis : Done
– Collaborative Development: On-going (3 merged Blueprints in OpenStack Liberty)
– Standardization Sync: On-going (by NFV member efforts, joint meeting)
3
Doctor Members
• At project creation (Dec 2014)
– NTT DOCOMO, Sprint
– NEC, Nokia, Ericsson, Huawei, ClearPath Network, Cisco
• Now (Oct 2015)
– NTT DOCOMO, Sprint, AT&T, Telecom Italia, KDDI
– NEC, Nokia, Ericsson, Huawei, ClearPath Network, Cisco Cloudbase Solutions, Spirent, Intel, ZTE
2x
4
Assumption of VNF (NFV Application)
• Telco Applications basically deployed in active-standby or active-active fashion
App (Active) App (Standby)
VM VM
Machine Machine
App and App Manager (VNFM) cannot detect HW failures
directly
App state will be switched when failure occurred
5
Consumer C1 Consumer C2 Consumer C3
Virtualized Infrastructure Manager (VIM), e.g. OpenStack
Resource Map
Server – VM mapping
Server S1 VM-1, VM-2 Server S2 VM-7 Server S3 VM-4
Ownership information
VM-1, VM-7 Consumer C1 VM-2 Consumer C2 VM-4 Consumer C3
Resource Pool
Hypervisor
Hardware Server S1
VM-1
Hypervisor
Hardware Server S2
Hypervisor
Hardware Server S3
VM-2 VM-7 VM-4
X 1. Fault Monitoring - Hardware fault - Hypervisor fault - Host OS fault
6. Execute Instruction - e.g. migrate VM
2. Inform the Consumer? If YES, find owner of
affected VMs from database
OpenStack Northbound Interface
3. FaultNotification (VM ID, Fault ID)
5. Instruction (VM ID)
4. Switch to SBY configuration
Use Case 1: Fault management
6
Consumer C1 Consumer C2 Consumer C3
Virtualized Infrastructure Manager (VIM), e.g. OpenStack
Resource Map
Server – VM mapping
Server S1 VM-1, VM-2 Server S2 VM-7 Server S3 VM-4
Ownership information
VM-1, VM-7 Consumer C1 VM-2 Consumer C2 VM-4 Consumer C3
Resource Pool
Hypervisor
Hardware Server S1
VM-1
Hypervisor
Hardware Server S2
Hypervisor
Hardware Server S3
VM-2 VM-7 VM-4 6. Execute Instruction - e.g. migrate VM
OpenStack Northbound Interface
3. Maintenance Notification (VM ID) 5. Instruction
(VM ID)
4. Switch to SBY configuration
2. Which VMs are affected? Find Consumer owning the VM(s) from the database.
Administrator
1. Maintenance Request (Server S3)
Use Case 2: Maintenance
7
Fault Management Sequence
Virtualized Infrastructure
Applications
VIM User and Administrator
Virtualized Infrastructure Manager (VIM)
= OpenStack
Virtual Compute
Virtual Storage
Virtual Network
Virtualization Layer
Hardware Resources
App App App
Detection
Reaction
Doctor Scope
8
Key Requirements as VIM
Immediate Notification Consistent Resource
State Awareness
Extensible Monitoring Fault Correlation
9
Doctor Architecture and Typical Scenario
Monitor
Notifier
Manager
Virtualized Infrastructure (Resource Pool)
Alarm Conf.
3. Update State 2. Find Affected
Application
Controller Controller
Controller
Resource Map
1. Raw Failure
Inspector
4. Notify all
5. Notify Error
0. Set Alarm
6-. Action
Failure Policy
Monitor Monitor
10
Doctor OSS Map
Monitor
Notifier
Manager
Virtualized Infrastructure (Resource Pool)
Alarm Conf.
3. Update State 2. Find Affected
Application
Controller Controller
Controller
Resource Map
1. Raw Failure
Inspector
4. Notify all
5. Notify Error
0. Set Alarm
6-. Action
Failure Policy
Monitor Monitor
Ceilometer
e.g. Monasca e.g. Zabbix
Cinder
Neutron
Nova
11
Doctor OSS Development
Monitor
Notifier
Manager
Virtualized Infrastructure (Resource Pool)
Alarm Conf.
3. Update State 2. Find Affected
Application
Controller Controller
Controller
Resource Map
1. Raw Failure
Inspector
4. Notify all
5. Notify Error
0. Set Alarm
6-. Action
Failure Policy
Monitor Monitor
Ceilometer
Event Alarm
Cinder
Neutron
Nova
State Correction
e.g. Zabbix e.g. Monasca
12
Doctor Blueprints in Liberty Cycle
Project Blueprint Spec Drafter Developer Status
Ceilometer Event Alarm Evaluator Ryota Mibu (NEC)
Ryota Mibu (NEC)
Completed (Liberty)
Nova
New nova API call to mark nova-compute down
Tomi Juvonen (Nokia)
Roman Dobosz (Intel)
Completed (Liberty)
Support forcing service down Tomi Juvonen (Nokia)
Carlos Goncalves (NEC)
Completed (Liberty)
Get valid server state Tomi Juvonen (Nokia)
Spec approved (Mitaka)
Add notification for service status change
Balazs Gibizer (Ericsson)
Balazs Gibizer (Ericsson)
Waiting for spec approval (Mitaka)
✓
✓
✓
13
Doctor BP Detail: Nova – Mark Nova-Compute Down
Host / Machine
Hypervisor
VM
nova compute
nova api
nova conductor
nova scheduler
nova DB queue
External Monitoring Service
vSwitch
BMC
EXISTING (periodic update)
Force-down API
NEW API to update nova-compute service state
service state
Monitoring Client
14
Doctor BP Detail: Ceilometer - Event Alarm
sample
Notification-driven alarm
evaluator
NEW Shortcut (notification-based)
EXISTING (polling-based)
Manager
Audit Service
stats
notification
event
Cinder Neutron Nova
15
Doctor Southbound API
User NFVI
Conf. Policy
Controller Inspector Notifier
Admin
Conf.
Monitor
Configuration Fault Messaging
Unified Event API Monitor
Monitor
Threshold
Enable
Enable
16
Doctor Status
Notifier Monitor Controller Inspector
Ceilometer
Zab
bix
Nova Monasca?
DP
DK
Neu
tron
Cin
der
Done
Next
Ste
p
To-Be Arch. Design
Gap Analysis
Blueprint
Coding
Integration
OPNFV Release
Dec 2014
Sep 2015
Feb 2016
Mar 2015
17
Don’t miss out...
• “Doctor – Fault Management” Project Theater, Wednesday, 3:55 pm – 4:15 pm
• “Doctor: Failure Detection and Notification for NFV” DOCOMO booth, PoC Demo Zone