Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

24
Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

description

Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi. Introduction. Autonomic Problem Approach Results Discussion. The Autonomic Problem. To allow the application to recover automatically from transient and intermittent software failure. The Approach. - PowerPoint PPT Presentation

Transcript of Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Page 1: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Autonomous Recovery in Componentized Internet

ApplicationCandea et. al

Vikram Negi

Page 2: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Introduction

• Autonomic Problem

• Approach

• Results

• Discussion

Page 3: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

The Autonomic Problem

• To allow the application to recover automatically from transient and intermittent software failure.

Page 4: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

The Approach

• Introduce the idea :– Microanalysis (fault detection)– Microrebooting (rapid recovery)– External Management (recovery action)

• Integrate and Test with JBOSS

Page 5: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Design Overview

• Autonomous Process – Monitoring

• Java probes

– Fault detection• Generate Anomaly report

– Recovery• Takes action

• Total time to recovery.

Page 6: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

J2EE Review

• J2EE enterprise apps = collection of reusable Java modules

• JSPs / servlets invoke EJBs, which invoke other EJBs, ...

• EJB = Java component that complies to a certain interface and provides a service

• Deployment descriptor (per-bean XML file) conveys run-time characteristics and dependencies; used in deploying the application

Page 7: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

JBoss Design

• Open-source J2EE app server• Written entirely in Java • Microkernel with components held together by JMX (Mgmt Support)

Page 8: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

JAGR = ROC-ified JBoss with Application-Generic Recovery

• 3 Tier Architecture

• Key Components– Macro analysis Engine

– Microrebooting Hook

– Recovery Manager

Page 9: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Pinpoint : Detection and Localization

• Store Observation– IP address of machine, timestamp– Globally unique request ID. – # of calls/returns to EJB’s– Association between sender and receiver.– Collect SQL Queries, update, read

Page 10: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Pinpoint : Analysis

• Analysis Engine– Centralized Engine

– Plugin based architecture

• Modeling Components– Assume both present

component behavior and historical (normal) behavior have same probability distribution.

– Ki square test to determine different probability distribution.

Page 11: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Recovery : micro-reboot is not expensive

• State Segregation– Store impt. state outside the application in database. – Persistent State

• CMP (container managed persistence, J2EE) is a requirement for prototype.

– Session State• Store in modified SSM(external session state store)

• Containment and Reintegration– Microreboot transitive closure of all inter-EJB references– XML deployment descriptors to determine grouping for closure– Complete or micro reboot

Page 12: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Recovery

• Enabling Micro reboot– Method in JBOSS EJB Container– Preserve Class Loader

Page 13: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Manage Recovery

• Recovery Policy

– Read failure report consider components > 1.0

– Micro-reboot(top n) or all >1.0

– Allow delay (~30sec)

– If error is present still try few time or reboot completely

– Finally report it to sys admin

Page 14: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Evaluation Test Framework

• Application– Petstore 1.1 (12 comp, 233 java file, 11K Loc)

– Petstore 1.3.1(47 comp, 310 java file 10K Loc)

– RUBiS (21 comp, 500 java file , 25K Loc)

• Workload– Implement Simulators with Transition table.

– 350 client (max utilization principle)

• Faultload– Based on industry experience

– No low level hardware or OS faults.

Page 15: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Evaluation Detection

• Result similar to other detector

• No discussion on absolute numbers?• Forced Java Runtime/Declared Exceptions, call emission and src code bug

• 1# How well the fault was detected, 2#how well major outage was detected ?

Page 16: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Evaluation : Localization

Localization % for a algorithm per fault type CIA > 85%No absolute data again ?

Page 17: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Evaluation : Recovery

• Introduce faults in SSM-RUBiS.

• Restart SSM-RUBiS or micro reboot component.

• Observation from 10 trials per 350 concurrent client.

Page 18: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Full v/s Micro reboot

• Injected a null reference fault in SB CommitBid, then a corrupt User-Item, SB BrowseCategories and SB CommitUserFeedback.

• Microreboot maintains steady response.

• 425 vs 3916 failed request

• 61527 vs 56028 success request

• What error condition did other trials had?

Page 19: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Total Recovery Time

• Corrupt SB_ViewItem set it to NULL.• 19.4 sec TRT• 18.5 sec in analysis• Pinpoint is bottleneck in micro reboot.

Page 20: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Pinpoint is app generic ?

• Upgrade to Petstore v.1.3.2– Works for the confidence interval

How different was the updated version??

Page 21: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Perfomance Overload

• Results for 30min fault free run w/ 350 clients

• In memory v/s Out memory (SSM)

• Marshalling costs

Page 22: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Assumption

• Well defined interface for components (.Net,J2ee)

• Deterministic call path b/w component

• No critical service request

• Training data for statistical model

• Guidelines (Crash Only Software)

Page 23: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Discussion

• Overall one of the Good Papers maybe bit verbose in introduction !

• Integrating framework for earlier work by Candea.• Limitation of the present statistical model.• Shared EJB state

– Modify JIT, disable microreboots(ref, static var)

• Application – Global data not scrubbed. • Cost Benefit : micro reboot v/s total reboot

Page 24: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Supplementary

• Application server = operating system for Internet applications (instantiates app components in containers, provides runtime system services, integrates with web server to make app webaccessible)

• http://people.epfl.ch/george.candea