1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center...

36
1 Generic Adaptive Contro Contact: Joe Hellerstein IBM Thomas J Watson Research Center [email protected] May 16, 2003 http://www.research.ibm.com/PM

Transcript of 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center...

Page 1: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

1

Generic Adaptive Control

Contact: Joe Hellerstein

IBM Thomas J Watson Research Center

[email protected]

May 16, 2003

http://www.research.ibm.com/PM

Page 2: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

2

Participants Research

Joe Bigus (ABLE) Markus Debusman (University of Applied Science, Wiesbaden Germany) Yixin Diao Frank Eskesen Steve Froehlich Joe Hellerstein Alexander Keller Xue Lui (Univ. of Illinois) Sujay Parekh Lui Sha (Univ. of Illinois) Maheswaran Surendra (team lead) Dawn Tilbury (Univ. of Michigan)

DB2 Randy Horman Matt Huras Ed Lassettre Sam Lightstone Kevin Rose Adam Storm

Server Group Lisa Spainhower

WebSphere Carolyn Norton

HVWS Noshir Wadia Eric Ye

Page 3: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

3

Web Servers Application ServersEnd Users

KeepAlive

TImeout

Number of

Threads

MaxClientsDB

Connections

Example: Configuration & Optimization in WebSphere

Fast response cache

MaxRequestsPerChild

ThreadsPerChild

Max simultan. requests

ListenBackLog

URL Cache

EJB threads

JVM heap size

Servlet reload int

Administrator

Challenges: Skill shortage Multiple vendors, multiple standards Mapping policies to IT “knobs”

Page 4: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

4

Project GoalsDevelop a formal basis for resource

management problems with dynamics (especially policy enforcement)

Demonstrate the practical value of the approach

Evangelize the approach Book, tutorials, classes Methodology and tools

Page 5: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

5

Agenda Basics of Control Theory Regulating concurrent users in Lotus Notes: pole

placement design Regulating utilizations in Apache Optimizing response times in Apache Throttling DB2 utilities DB2 self-tuning memory Regulating service levels in a multi-tiered eCommerce

system (HotRod) Educational efforts (book, tutorials) Summary

Page 6: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

6

Control of Lotus Notes eMail Server

Measured Queue Length

TargetQueue Length

RPCs

MaxUsers

Lotus Notes Server

Workload generator

AutoTune Agent

Administrator

Slow

K=.1

Better

K=1

Bad

K=5

Uncontrolled

Page 7: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

7

System Identification:Estimate Transfer Function

Notes ServerMaxUsers Actual Queue Length

)(tq)(tu

Dynamic model

)()1()( 01 tubtqatq

0 20 40 60 80 100020406080

100

Observed QL

Pre

dict

ed Q

L

055.0

913.0

0

1

b

a

97.2 R

Page 8: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

8

Controller Design

ControllerG(z)

Notes ServerN(z)

SensorS(z)-

+H(z) = Closed Loop Transfer Function)(te

Simplified Integral Control Law

)()1()( tKetutu Design for “poles” of H(z)

K=1K=5

Page 9: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

9

Control of Apache Server

CPU Utilization,Memory Utilization

Policies &Reports

Web Service requestsMaxClients,

KeepAlive TO Apache System

Workload generator

AutoTune Agent

Administrator

Contribution: Multiple Input, Multiple Output

Page 10: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

10

KeepAlive

MaxClients

SvcTimestats

Shared Mem

Apache Control EnablementsOS

(procfs)

Master

Worker Procs

mod_controllerSPAWNKILL

Web Server

CPU utilMem util

ExternalController

ExternalRT Probe

Get/Set interface

Internal Controller

mod_controller (close-up)

HTTP

Inter-Process

Value flow

Process

LEGENDRT info

GET/SET

Page 11: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

11

Two SISO

models

Model Structure

KA CPU

MEMMC

The Transfer Function Relationship

G11

G22

Apache Server

++

++

G11

MIMOmodel

G21

G12

G22

G11

SISOvs.

MIMO

G21

G12

G22

0

0

SISO approach assumes cross terms are negligible

++

++

Page 12: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

12

Model ComparisonC

PU

MEM

KA

MC

00.5

1

0

0.5

1

0

10

20

0 500 1000 15000

500

1000

Time (s)

Two SISO Models

CPU

MEM

KA

MC

00.5

1

0

0.5

1

0

10

20

0 500 1000 15000

500

1000

Time (s)

MIMO ModelModel

Prediction

CPU: SISO model fails because MC and KA both affect CPU, MIMO model is able to capture this relationship

MEM: Both models do a good job of predicting system response

Page 13: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

13

Optimization of Apache Server

Response Time

Web Service requestsMaxClients

Apache System

Workload generator

AutoTune Agent

Page 14: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

14

New Users

New Users

New conn

Timeout()

TCPAcceptqueue

Close()

+

Apache

MaxClients

Apache Operation

Heuristic: Find the smallest MaxClients that eliminates TCP queueing

Page 15: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

15

0 100 200 300 400 500 600 700 800 9000

10

20

30

40

50

60

70

80

90

100

MaxClients

Res

pons

e tim

e (s

ec.)

Apache Defaults

MaxClients

Res

pons

e T

ime

Impact of MaxClients

Page 16: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

AutoTune Using Fuzzy Rules

FuzzyController

d/dt

Inferencemechanism

Rule base

Fuz

zifi

cati

on

Def

uzzi

fica

tion

• Fuzzification– Convert numeric variables

to linguistic variables

– Characterized by membership functions

• Rule base– IF-THEN rules

– Using linguistic variables

• Inference mechanism– Activate the fuzzy rules (IF)

– Combine the rule actions (THEN)

• Defuzzification– Convert linguistic variables

to numeric variables

Page 17: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

Constructing Fuzzy Rules

Response

Time (RT)

MaxClients

Rule 4Rule 2

• Rule 1: IF change-in-MaxClients is poslarge and change-in-RT

is neglarge THEN next-change-in-MaxClients is poslarge• Rule 2: IF change-in-MaxClients is neglarge and change-in-RT

is poslarge THEN next-change-in-MaxUsers is poslarge

•Rule 3: IF change-in-MaxClients is neglarge and change-in-RT

is neglarge THEN next-change-in-MaxUsers is neglarge

• Rule 4: IF change-in-MaxClients is poslarge and change-in-RT

is poslarge THEN next-change-in-MaxUsers is neglarge

Decision making:- Increment direction- Increment size

Rule 1 Rule 3

Page 18: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

Apache defaultOptimized setting

AutoTune Controlling MaxClients on Apache

Page 19: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

Old optimized settingNew optimized setting

Workload changes

AutoTune Response to a new workload

Page 20: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

20

Disk,CPUUtilizations

DB2 UDB Utilities Throttling (SMART Project)

Backup

Restore

Re-Balance

UDB Engine

Server

Target Utilization

Sleep Delay

Page 21: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

21

Backup and Restore Process Model

DB2 Backup / Restore Buffers

DB2 I/OProcessors

db2bm Processes

OS

db2med Processes

db2agent Process

Database

Source / Destination

Page 22: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

22

Success Is:

Time

1

% U

tili

zati

on

Note: This is a longer-time averaged value than on slide 5.

Gap due to reduced utilization in sleep periods

High System Utilization Small Effect onUser Throughput

x Utility w/oT.P.

w/UtilityT.P.1

Page 23: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

23

Throttling a Single Utility

Standard PI controller tries to reach E=0 Assume: linear effect of throttling on Y

DBA

ComputeDegradation

ModelEstimation

Controller DB2R U

Y

E

Y

BaselineEstimation

Y*

M

WL

+

-

*

*

*

Y

YYM

Y

MRE

),

ba

ba

+a

Utility

b

Workload

U aU

b

Y

DB2

Parameters characterizing DB2

Control errorMax thruput from utility + workload

Thruput degradation

Page 24: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

24

Baseline Measurement: idling

P1

P2

P3

LTc*

LTLTd *

%100Sleep_Tm%nSleep_Tm

Start1 End1 Start2 End2

22

22

StartEnd

StartEndo tt

ppr

12

12

StartEnd

StartEndl tt

ppr

•“Start” is perf output after all Pi have read new control value.•“End” is from closest output to control change

Control Points

Time

“Loop” Throughput “Other” (Sleep) Throughput

LT LT

Page 25: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

25

Baseline Estimation Over time, record sequence {(ti, pi, si)}

t = Time p = Perf at time t s = SleepPct at time t

Fit a “curve” to this data, to get model M E.g., Over some fixed time interval of the past

p

s 1

Page 26: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

26

Control with disturbance

Baseline estimation needs work Cannot adjust to large workload change

Controller response still OK

0 1000 2000 3000 4000 5000 60000

500

1000

1500

2000

2500Control with disturbance

Stm

t/se

c

ReferenceActual

0 1000 2000 3000 4000 5000 60000

0.2

0.4

0.6

0.8

1

Sle

epP

ct

Time(sec)

0 1000 2000 3000 4000 5000 6000 70000

500

1000

1500

2000

2500Control with disturbance

Stm

t/se

c

ReferenceActual

0 1000 2000 3000 4000 5000 6000 70000

0.2

0.4

0.6

0.8

1

Sle

epP

ct

Time(sec)

Large Disturbance Small Disturbance

Page 27: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

27

Few minutes later…

Dynamic Surge ProtectionSystems can go from steady state … Systems can go from steady state …

Internet

toto overloaded without overloaded without warning warning

Page 28: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

28

Resource Actions With Lead TimesDefinition of lead time:

Delay from request to action taking effectExamples

From provision a server to its servicing requesting

From de-provision a server to its being returned to a free pool

From increase size of a buffer pool to pool is filled with data

Page 29: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

29

Effect of Lead Times on WAS Provisioning

Leadtime

Page 30: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

30

Benefits of Proactive Provisioning

Leadtime

Page 31: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

31

Workload

Application

DB2 v8.1WAS

5.0

Deployment ManagerConfigurationManagement

Monitoring

Solution Manager

HVWS

Forecaster Controller PerformanceModeler

On-Demand Actions

On-Line Capacity Planning

On-Demand Actions

Adaptive Forecasting

BOPS

RT #WAS1

2

3

A AAPP

MM EE

E

ES

Element

Monitor

Analyze

Sensors

Execute

Plan

Effectors

Knowledge

Autonomic Computing: Dynamic Surge Protection

Page 32: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

32

CeBit PressReuters: IBM: Software Can Predict Computer DemandC/Net: IBM offers details on autonomic softwareInfoWorld: IBM to show new autonomic suite at CeBITIDG News: IBM to show off new autonomic technologyInformationWeek: More Autonomic Capabilities From IBM InternetNews:IBM Spruces Up Autonomic Computing Offerings cw360.com: IBM to demo autonomic technology at CeBIT

Page 33: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

33

Control Theory Book Feedback Control of Computing Systems

Wiley-Interscience Intended audience

Computer scientist with minimal math background (geometric series) who want to apply techniques to practical problems

Control theorist looking for new applications Status

10 of 11 chapters at a “beta” level Expected completion by end of June Publication in 2004

Page 34: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

34

Table of Contents1. Introduction (Qualitative control theory)

2. Model construction (statistics)

3. Z-Transforms and transfer functions (component models)

4. Block diagrams (system models)

5. First order systems

6. Higher order systems

7. State space models (multi-variate models)

8. Proportional control (feedback basics)

9. Other classical controllers (PID, tuning controllers)

10. State space feedback control (MIMO)

11. Advanced topics

Page 35: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

35

Progress Towards Project Goals

Develop/identify a formal approach Control theory based

Demonstrate value Lotus Notes – control w/o instabilities Apache – simple way to optimize tuning parameters DB2 Utilities Throttling HotRod – handling resource actions

with dead times HotRod prototype – resource actions w/lead times

Evangelize Feedback Control of Computing Systems, Wiley-Interscience Tutorials: Almaden, Integrated Management, Stanford/Berkeley Classes: Columbia?, University of Michigan? AC toolkit integration

Page 36: 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center hellers@us.ibm.com May 16, 2003 .

1. "Using Control Theory to Achieve Service Level Objectives in Performance Management," S Parekh, N Gandhi, JL Hellerstein, D Tilbury, TS Jayram, J Bigus,  Real Time Systems Journal, 2002.

2. "Feedback Control of a Lotus Notes Server: Modeling and Control Design," N. Gandhi, S. Parekh, J. Hellerstein, and D.M. Tilbury, American Control Conference, 2001. (Best paper in session.)

3. "An Introduction to Control Theory With Applications to Computer Science," JL Hellerstein and S Parekh, ACM Sigmetrics, 2001.

4. Using MIMO Feedback Control to Enforce Policies for Interrelated Metrics With Application to the Apache Web Serve," Y Diao, N Gandhi, JL Hellerstein, S Parekh, and DM Tilbury. Network Operations and Management, 2002. (Best paper in conference.)

5. "MIMO Control of an Apache Web Server: Modeling and Controller Design," Y Diao, N Gandhi, JL Hellerstein, S Parekh, and DM Tilbury, American Control Conference, 2002. (Best paper in session.)

6. "Using Fuzzy Control to Maximize Profits in Service Level Management," Y Diao, JL Hellerstein, S Parekh. Accepted to the IBM Systems Journal, 2002.

7. "A First-Principles Approach to Constructing Transfer Functions for Admission Control in Computing Systems," JL Hellerstein, Y Diao, and S Parekh. Conference on Decision and Control, 2002.

8. "Generic On-Line Discovery of Quantitative Models for Service Level Management," Y Diao, F Eskesen, S Froehlich, JL Hellerstein, A Keller, L Spainhower, and M Surendra, IFIP Symposium on Integrated Management, 2003.

9. On-Line Response Time Optimization of An Apache Web Server," Yixin Diao, Xue Lui, Steve Froehlich, Joseph L Hellerstein, Sujay Parekh, and Lui Sha. To appear in International Workshop on Quality of Service, 2003.

http://www.research.ibm.com/PM