OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study,...

Post on 08-Mar-2018

217 views 3 download

Transcript of OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study,...

1

OPNFV Summit 2015

Doctor - Fault Management

Gerald Kunzmann, DOCOMO

Carlos Goncalves, NEC

Ryota Mibu, NEC

2

Doctor Overview

• Goal

– Build fault management and maintenance framework

• Approach

– Identify requirement

– Gap Analysis

– Implementation work in Upstream (OpenStack)

– Integration and testing

• Status

– Initial Requirement study, architecture design, Gap analysis : Done

– Collaborative Development: On-going (3 merged Blueprints in OpenStack Liberty)

– Standardization Sync: On-going (by NFV member efforts, joint meeting)

3

Doctor Members

• At project creation (Dec 2014)

– NTT DOCOMO, Sprint

– NEC, Nokia, Ericsson, Huawei, ClearPath Network, Cisco

• Now (Oct 2015)

– NTT DOCOMO, Sprint, AT&T, Telecom Italia, KDDI

– NEC, Nokia, Ericsson, Huawei, ClearPath Network, Cisco Cloudbase Solutions, Spirent, Intel, ZTE

2x

4

Assumption of VNF (NFV Application)

• Telco Applications basically deployed in active-standby or active-active fashion

App (Active) App (Standby)

VM VM

Machine Machine

App and App Manager (VNFM) cannot detect HW failures

directly

App state will be switched when failure occurred

5

Consumer C1 Consumer C2 Consumer C3

Virtualized Infrastructure Manager (VIM), e.g. OpenStack

Resource Map

Server – VM mapping

Server S1 VM-1, VM-2 Server S2 VM-7 Server S3 VM-4

Ownership information

VM-1, VM-7 Consumer C1 VM-2 Consumer C2 VM-4 Consumer C3

Resource Pool

Hypervisor

Hardware Server S1

VM-1

Hypervisor

Hardware Server S2

Hypervisor

Hardware Server S3

VM-2 VM-7 VM-4

X 1. Fault Monitoring - Hardware fault - Hypervisor fault - Host OS fault

6. Execute Instruction - e.g. migrate VM

2. Inform the Consumer? If YES, find owner of

affected VMs from database

OpenStack Northbound Interface

3. FaultNotification (VM ID, Fault ID)

5. Instruction (VM ID)

4. Switch to SBY configuration

Use Case 1: Fault management

6

Consumer C1 Consumer C2 Consumer C3

Virtualized Infrastructure Manager (VIM), e.g. OpenStack

Resource Map

Server – VM mapping

Server S1 VM-1, VM-2 Server S2 VM-7 Server S3 VM-4

Ownership information

VM-1, VM-7 Consumer C1 VM-2 Consumer C2 VM-4 Consumer C3

Resource Pool

Hypervisor

Hardware Server S1

VM-1

Hypervisor

Hardware Server S2

Hypervisor

Hardware Server S3

VM-2 VM-7 VM-4 6. Execute Instruction - e.g. migrate VM

OpenStack Northbound Interface

3. Maintenance Notification (VM ID) 5. Instruction

(VM ID)

4. Switch to SBY configuration

2. Which VMs are affected? Find Consumer owning the VM(s) from the database.

Administrator

1. Maintenance Request (Server S3)

Use Case 2: Maintenance

7

Fault Management Sequence

Virtualized Infrastructure

Applications

VIM User and Administrator

Virtualized Infrastructure Manager (VIM)

= OpenStack

Virtual Compute

Virtual Storage

Virtual Network

Virtualization Layer

Hardware Resources

App App App

Detection

Reaction

Doctor Scope

8

Key Requirements as VIM

Immediate Notification Consistent Resource

State Awareness

Extensible Monitoring Fault Correlation

9

Doctor Architecture and Typical Scenario

Monitor

Notifier

Manager

Virtualized Infrastructure (Resource Pool)

Alarm Conf.

3. Update State 2. Find Affected

Application

Controller Controller

Controller

Resource Map

1. Raw Failure

Inspector

4. Notify all

5. Notify Error

0. Set Alarm

6-. Action

Failure Policy

Monitor Monitor

10

Doctor OSS Map

Monitor

Notifier

Manager

Virtualized Infrastructure (Resource Pool)

Alarm Conf.

3. Update State 2. Find Affected

Application

Controller Controller

Controller

Resource Map

1. Raw Failure

Inspector

4. Notify all

5. Notify Error

0. Set Alarm

6-. Action

Failure Policy

Monitor Monitor

Ceilometer

e.g. Monasca e.g. Zabbix

Cinder

Neutron

Nova

11

Doctor OSS Development

Monitor

Notifier

Manager

Virtualized Infrastructure (Resource Pool)

Alarm Conf.

3. Update State 2. Find Affected

Application

Controller Controller

Controller

Resource Map

1. Raw Failure

Inspector

4. Notify all

5. Notify Error

0. Set Alarm

6-. Action

Failure Policy

Monitor Monitor

Ceilometer

Event Alarm

Cinder

Neutron

Nova

State Correction

e.g. Zabbix e.g. Monasca

12

Doctor Blueprints in Liberty Cycle

Project Blueprint Spec Drafter Developer Status

Ceilometer Event Alarm Evaluator Ryota Mibu (NEC)

Ryota Mibu (NEC)

Completed (Liberty)

Nova

New nova API call to mark nova-compute down

Tomi Juvonen (Nokia)

Roman Dobosz (Intel)

Completed (Liberty)

Support forcing service down Tomi Juvonen (Nokia)

Carlos Goncalves (NEC)

Completed (Liberty)

Get valid server state Tomi Juvonen (Nokia)

Spec approved (Mitaka)

Add notification for service status change

Balazs Gibizer (Ericsson)

Balazs Gibizer (Ericsson)

Waiting for spec approval (Mitaka)

13

Doctor BP Detail: Nova – Mark Nova-Compute Down

Host / Machine

Hypervisor

VM

nova compute

nova api

nova conductor

nova scheduler

nova DB queue

External Monitoring Service

vSwitch

BMC

EXISTING (periodic update)

Force-down API

NEW API to update nova-compute service state

service state

Monitoring Client

14

Doctor BP Detail: Ceilometer - Event Alarm

sample

Notification-driven alarm

evaluator

NEW Shortcut (notification-based)

EXISTING (polling-based)

Manager

Audit Service

stats

notification

event

Cinder Neutron Nova

15

Doctor Southbound API

User NFVI

Conf. Policy

Controller Inspector Notifier

Admin

Conf.

Monitor

Configuration Fault Messaging

Unified Event API Monitor

Monitor

Threshold

Enable

Enable

16

Doctor Status

Notifier Monitor Controller Inspector

Ceilometer

Zab

bix

Nova Monasca?

DP

DK

Neu

tron

Cin

der

Done

Next

Ste

p

To-Be Arch. Design

Gap Analysis

Blueprint

Coding

Integration

OPNFV Release

Dec 2014

Sep 2015

Feb 2016

Mar 2015

17

Don’t miss out...

• “Doctor – Fault Management” Project Theater, Wednesday, 3:55 pm – 4:15 pm

• “Doctor: Failure Detection and Notification for NFV” DOCOMO booth, PoC Demo Zone