Ryota Nishino The University of the South Pacific [email protected]
1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC.
-
Upload
jayson-hart -
Category
Documents
-
view
223 -
download
1
Transcript of 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC.
1
DoctorFault Management
18 May 2015
Ryota Mibu, NEC
2
Doctor Overview
• One of OPNFV Requirement Project (Identify requirement, Gap Analysis, Implementation Study)
• Goal
– Build fault management and maintenance framework for high availability of Network Services on top of virtualized infrastructure
– Valuable and acceptable framework for other industries
• Status
– Initial Requirement study, architecture design, Gap analysis : Done (See Document [link])
– Collaborative Development: Started (Blueprints are proposed to Nova and Ceilometer)
– Standardization Sync: On-going (by NFV member efforts, joint meeting)
3
Use Case 1: Fault management
4
Use Case 2: Maintenance
5
High Level Architecture
Virtualized Infrastructure
Applications
VIM User and Administrator
Virtualized Infrastructure Manager (VIM)= OpenStack
Virtual Comput
e
Virtual Storage
Virtual Network
Virtualization Layer
Hardware Resources
App App App
6
Fault Management Sequence
Virtualized Infrastructure
Applications
VIM User and Administrator
Virtualized Infrastructure Manager (VIM)= OpenStack
Virtual Comput
e
Virtual Storage
Virtual Network
Virtualization Layer
Hardware Resources
App App App
Detectio
n
Reaction
Doctor Initial Focus
8
Key Requirements as VIM
Immediate Notification
Consistent Resource State
Awareness
Extensible Monitoring
Fault Correlation
9
TO-BE: Functional Blocks
Virtualized Infrastructure
Applications
VIM User and Administrator
VIM
Virtual Comput
e
Virtual Storage
Virtual Network
Virtualization Layer
Hardware Resources
App App App
Notifier
Monitor
Controller
Inspector
10
Fault Management Scenarios (1/2)
Monitor
Notifier
User-sideManager
Virtualized Infrastructure
Alarm
Conf.3. Update State2. Find Affected
Applications
ControllerController
Controller
Resource Map
1. Raw Failure
Inspector
4. Notify all
4. (alt) Notify
Admin-side Manager
5. Notify Error
0. Set Alarm
6-. Action
Failure
Policy
MonitorMonitor
11
Fault Management Scenarios (2/2)
Monitor
Notifier
User-sideManager
Virtualized Infrastructure
Alarm
Conf.3. Update State2. Find Affected
Applications
ControllerController
Controller
Resource Map
1. Raw Failure
Inspector
4. Notify all
4. (alt) Notify
Admin-side Manager
5. Notify Error
0. Set Alarm6-. Action
Failure
Policy
MonitorMonitor
12
AS-IS: OpenStack Kilo (1/3)
• How can you find faults as a tenant user?
– Keep-a-live check to each VM– Polling VM state to Nova API– Set alarm on metering service (e.g. CPU runtime)
13
AS-IS: OpenStack Kilo (2/3)
• How does the metering service work?
1. Resource controller such as Nova monitors usage of resource [Periodically]
2. Get samples from resource controller and register them to DB [Periodically]
3. Evaluate alarm definition on samples [Periodically]4. Raise alarm depend on result of the evaluation
Machine
Hypervisor
VM
Nova Ceilometer (Heat)
Samples
1.
2. 3
.
4.
14
AS-IS: OpenStack Kilo (3/3)
• Notification
– OpenStack components post events to messaging queue– Ceilometer collects, transform and publish those events which can be
used for billing
NFVI Neutron Ceilometer (Billing)
Samples
Nova
Cinder
Que
ue
15
Implementation Plan in OpenStack
15
Ceilomter
Virtualized Infrastructure
Applications
Zabbix
VIM User and Administrator
Error Injection
Plugin ?
Event Alarm
Immediate Notification
Queue
Inspector
Nova
16
Demo (1/3)
• User Scenario
Web Server
Web Server
Web Server
Load Balancer
HTTP ClientsHTTP
ClientsHTTP Clients
Public Net Private Net
Launch New VM
17
Demo (2/3)
• Demo 1
• Demo 2
Machine
Hypervisor
VM
Nova
Ceilometer (Heat)
Samples
1. Collect CPU time samples
2. Alarm Heat if CPU runtime = 0
3. Create New Web Server
1. Hook
3. Alarm Heat
Agent
Alarm
2. Notify as Event
Machine
Hypervisor
VM
Nova
Ceilometer (Heat)
Agent
Alarm
18
Demo (3/3) Results
• Demo 1
• Demo 2
90 sec
26 sec
19
Doctor Southbound API
UserNFVI
Conf.Polic
yControlle
rInspector Notifier
Admin
Conf.
Monitor
ConfigurationFault Messaging
Unified Event API
Monitor
Monitor
Threshold
Enable
Enable
20
Case 1: Obvious Fault
UserNFVI
Conf.Polic
yControlle
rInspector Notifier
Admin
Conf.
Monitor
ZabbixBMC(Inspecto
r)Nova
Ceilometer
User
ConfigurationFault Messaging
SNMP Trap(Power-off)
HTTP POST(Host A down)
HTTP POST(Host A down,
VM A1-A3 down)
HTTP POST(VM A1 down)
HTTP POST(Alert: VM A1 down)
HTTP POST(Create Alarm)
Enable
Enable
21
Case 2: Threshold Exceeded Fault (Admin Config)
UserNFVI
Conf.Polic
yControlle
rInspector Notifier
Admin
Conf.
Monitor
Zabbix
Monitor Agent
(Inspector)
NovaCeilomet
erUser
ConfigurationFault Messaging
HTTP POST(Switch down) HTTP POST
(Host A down, VM A1-A3 down)
HTTP POST(VM A1 down)
HTTP POST(Alert: VM A1 down)
HTTP POST(Create Alarm)
Threshold
Enable
Enable
vSwitch
collectd
Admin Threshold
22
Backup
23
Fault Management Sequence (Optional)
Virtualized Infrastructure
Applications
VIM User and Administrator
Virtualized Infrastructure Manager (VIM)= OpenStack
Virtual Comput
e
Virtual Storage
Virtual Network
Virtualization Layer
Hardware Resources
App App App
Auto Reaction
Detectio
n
Reaction
24
Fault Management Scenarios (Optional)
Monitor
Notifier
User-sideManager
Virtualized Infrastructure
Alarm
Conf.3. Update State2. Find Affected
Applications
ControllerController
Controller
Resource Map
1. Raw Failure
Inspector
4. Notify all
4. (alt) Notify
Admin-side Manager
5. Notify Error
0. Set Alarm
6-. Action
Failure
Policy
Monitor
Auto Reaction
Monitor
25
Configuration / Policy Enforcement
25
UserNFVI
Conf.Polic
yInspector Notifier
Admin
Policy Service
Conf.
Monitor
ConfigurationFault Messaging
Option 1: Policy Service Integration
Option 2: Using Metadata in Controller
Metadata
Threshold
Enable
Metadata
Controller
PolicyThreshold
Enable
26
Case 3: Threshold Exceeded Fault (User Config)
26
UserNFVI
Conf.Polic
yControlle
rInspector Notifier
Admin
Conf.
Monitor
Zabbix
Monitor Agent
(Inspector)
NovaCeilomet
erUser
ConfigurationFault Messaging
HTTP POST(Switch down) HTTP POST
(Host A down, VM A1-A3 down)
HTTP POST(VM A1 down)
HTTP POST(Alert: VM A1 down)
HTTP POST(Create Resource with Policy Label)
vSwitch
collectd
Admin
Policy Service
Enable
ThresholdEnable Threshold
Policy
CongressHTTP POST(Set Policy)
HTTP POST(Data)
Metadata