1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center...
-
Upload
bruno-harris -
Category
Documents
-
view
218 -
download
1
Transcript of 1 Generic Adaptive Control Contact: Joe Hellerstein IBM Thomas J Watson Research Center...
1
Generic Adaptive Control
Contact: Joe Hellerstein
IBM Thomas J Watson Research Center
May 16, 2003
http://www.research.ibm.com/PM
2
Participants Research
Joe Bigus (ABLE) Markus Debusman (University of Applied Science, Wiesbaden Germany) Yixin Diao Frank Eskesen Steve Froehlich Joe Hellerstein Alexander Keller Xue Lui (Univ. of Illinois) Sujay Parekh Lui Sha (Univ. of Illinois) Maheswaran Surendra (team lead) Dawn Tilbury (Univ. of Michigan)
DB2 Randy Horman Matt Huras Ed Lassettre Sam Lightstone Kevin Rose Adam Storm
Server Group Lisa Spainhower
WebSphere Carolyn Norton
HVWS Noshir Wadia Eric Ye
3
Web Servers Application ServersEnd Users
KeepAlive
TImeout
Number of
Threads
MaxClientsDB
Connections
Example: Configuration & Optimization in WebSphere
Fast response cache
MaxRequestsPerChild
ThreadsPerChild
Max simultan. requests
ListenBackLog
URL Cache
EJB threads
JVM heap size
Servlet reload int
Administrator
Challenges: Skill shortage Multiple vendors, multiple standards Mapping policies to IT “knobs”
4
Project GoalsDevelop a formal basis for resource
management problems with dynamics (especially policy enforcement)
Demonstrate the practical value of the approach
Evangelize the approach Book, tutorials, classes Methodology and tools
5
Agenda Basics of Control Theory Regulating concurrent users in Lotus Notes: pole
placement design Regulating utilizations in Apache Optimizing response times in Apache Throttling DB2 utilities DB2 self-tuning memory Regulating service levels in a multi-tiered eCommerce
system (HotRod) Educational efforts (book, tutorials) Summary
6
Control of Lotus Notes eMail Server
Measured Queue Length
TargetQueue Length
RPCs
MaxUsers
Lotus Notes Server
Workload generator
AutoTune Agent
Administrator
Slow
K=.1
Better
K=1
Bad
K=5
Uncontrolled
7
System Identification:Estimate Transfer Function
Notes ServerMaxUsers Actual Queue Length
)(tq)(tu
Dynamic model
)()1()( 01 tubtqatq
0 20 40 60 80 100020406080
100
Observed QL
Pre
dict
ed Q
L
055.0
913.0
0
1
b
a
97.2 R
8
Controller Design
ControllerG(z)
Notes ServerN(z)
SensorS(z)-
+H(z) = Closed Loop Transfer Function)(te
Simplified Integral Control Law
)()1()( tKetutu Design for “poles” of H(z)
K=1K=5
9
Control of Apache Server
CPU Utilization,Memory Utilization
Policies &Reports
Web Service requestsMaxClients,
KeepAlive TO Apache System
Workload generator
AutoTune Agent
Administrator
Contribution: Multiple Input, Multiple Output
10
KeepAlive
MaxClients
SvcTimestats
Shared Mem
Apache Control EnablementsOS
(procfs)
Master
Worker Procs
mod_controllerSPAWNKILL
Web Server
CPU utilMem util
ExternalController
ExternalRT Probe
Get/Set interface
Internal Controller
mod_controller (close-up)
HTTP
Inter-Process
Value flow
Process
LEGENDRT info
GET/SET
11
Two SISO
models
Model Structure
KA CPU
MEMMC
The Transfer Function Relationship
G11
G22
Apache Server
++
++
G11
MIMOmodel
G21
G12
G22
G11
SISOvs.
MIMO
G21
G12
G22
0
0
SISO approach assumes cross terms are negligible
++
++
12
Model ComparisonC
PU
MEM
KA
MC
00.5
1
0
0.5
1
0
10
20
0 500 1000 15000
500
1000
Time (s)
Two SISO Models
CPU
MEM
KA
MC
00.5
1
0
0.5
1
0
10
20
0 500 1000 15000
500
1000
Time (s)
MIMO ModelModel
Prediction
CPU: SISO model fails because MC and KA both affect CPU, MIMO model is able to capture this relationship
MEM: Both models do a good job of predicting system response
13
Optimization of Apache Server
Response Time
Web Service requestsMaxClients
Apache System
Workload generator
AutoTune Agent
14
New Users
New Users
New conn
Timeout()
TCPAcceptqueue
Close()
+
Apache
MaxClients
Apache Operation
Heuristic: Find the smallest MaxClients that eliminates TCP queueing
15
0 100 200 300 400 500 600 700 800 9000
10
20
30
40
50
60
70
80
90
100
MaxClients
Res
pons
e tim
e (s
ec.)
Apache Defaults
MaxClients
Res
pons
e T
ime
Impact of MaxClients
AutoTune Using Fuzzy Rules
FuzzyController
d/dt
Inferencemechanism
Rule base
Fuz
zifi
cati
on
Def
uzzi
fica
tion
• Fuzzification– Convert numeric variables
to linguistic variables
– Characterized by membership functions
• Rule base– IF-THEN rules
– Using linguistic variables
• Inference mechanism– Activate the fuzzy rules (IF)
– Combine the rule actions (THEN)
• Defuzzification– Convert linguistic variables
to numeric variables
Constructing Fuzzy Rules
Response
Time (RT)
MaxClients
Rule 4Rule 2
• Rule 1: IF change-in-MaxClients is poslarge and change-in-RT
is neglarge THEN next-change-in-MaxClients is poslarge• Rule 2: IF change-in-MaxClients is neglarge and change-in-RT
is poslarge THEN next-change-in-MaxUsers is poslarge
•Rule 3: IF change-in-MaxClients is neglarge and change-in-RT
is neglarge THEN next-change-in-MaxUsers is neglarge
•
• Rule 4: IF change-in-MaxClients is poslarge and change-in-RT
is poslarge THEN next-change-in-MaxUsers is neglarge
Decision making:- Increment direction- Increment size
Rule 1 Rule 3
Apache defaultOptimized setting
AutoTune Controlling MaxClients on Apache
Old optimized settingNew optimized setting
Workload changes
AutoTune Response to a new workload
20
Disk,CPUUtilizations
DB2 UDB Utilities Throttling (SMART Project)
Backup
Restore
Re-Balance
UDB Engine
Server
Target Utilization
Sleep Delay
21
Backup and Restore Process Model
DB2 Backup / Restore Buffers
DB2 I/OProcessors
db2bm Processes
OS
db2med Processes
db2agent Process
Database
Source / Destination
22
Success Is:
Time
1
% U
tili
zati
on
Note: This is a longer-time averaged value than on slide 5.
Gap due to reduced utilization in sleep periods
High System Utilization Small Effect onUser Throughput
x Utility w/oT.P.
w/UtilityT.P.1
23
Throttling a Single Utility
Standard PI controller tries to reach E=0 Assume: linear effect of throttling on Y
DBA
ComputeDegradation
ModelEstimation
Controller DB2R U
Y
E
Y
BaselineEstimation
Y*
M
WL
+
-
*
*
*
Y
YYM
Y
MRE
),
ba
ba
+a
Utility
b
Workload
U aU
b
Y
DB2
Parameters characterizing DB2
Control errorMax thruput from utility + workload
Thruput degradation
24
Baseline Measurement: idling
P1
P2
P3
LTc*
LTLTd *
%100Sleep_Tm%nSleep_Tm
Start1 End1 Start2 End2
22
22
StartEnd
StartEndo tt
ppr
12
12
StartEnd
StartEndl tt
ppr
•“Start” is perf output after all Pi have read new control value.•“End” is from closest output to control change
Control Points
Time
“Loop” Throughput “Other” (Sleep) Throughput
LT LT
25
Baseline Estimation Over time, record sequence {(ti, pi, si)}
t = Time p = Perf at time t s = SleepPct at time t
Fit a “curve” to this data, to get model M E.g., Over some fixed time interval of the past
p
s 1
26
Control with disturbance
Baseline estimation needs work Cannot adjust to large workload change
Controller response still OK
0 1000 2000 3000 4000 5000 60000
500
1000
1500
2000
2500Control with disturbance
Stm
t/se
c
ReferenceActual
0 1000 2000 3000 4000 5000 60000
0.2
0.4
0.6
0.8
1
Sle
epP
ct
Time(sec)
0 1000 2000 3000 4000 5000 6000 70000
500
1000
1500
2000
2500Control with disturbance
Stm
t/se
c
ReferenceActual
0 1000 2000 3000 4000 5000 6000 70000
0.2
0.4
0.6
0.8
1
Sle
epP
ct
Time(sec)
Large Disturbance Small Disturbance
27
Few minutes later…
Dynamic Surge ProtectionSystems can go from steady state … Systems can go from steady state …
Internet
toto overloaded without overloaded without warning warning
28
Resource Actions With Lead TimesDefinition of lead time:
Delay from request to action taking effectExamples
From provision a server to its servicing requesting
From de-provision a server to its being returned to a free pool
From increase size of a buffer pool to pool is filled with data
29
Effect of Lead Times on WAS Provisioning
Leadtime
30
Benefits of Proactive Provisioning
Leadtime
31
Workload
Application
DB2 v8.1WAS
5.0
Deployment ManagerConfigurationManagement
Monitoring
Solution Manager
HVWS
Forecaster Controller PerformanceModeler
On-Demand Actions
On-Line Capacity Planning
On-Demand Actions
Adaptive Forecasting
BOPS
RT #WAS1
2
3
A AAPP
MM EE
E
ES
Element
Monitor
Analyze
Sensors
Execute
Plan
Effectors
Knowledge
Autonomic Computing: Dynamic Surge Protection
32
CeBit PressReuters: IBM: Software Can Predict Computer DemandC/Net: IBM offers details on autonomic softwareInfoWorld: IBM to show new autonomic suite at CeBITIDG News: IBM to show off new autonomic technologyInformationWeek: More Autonomic Capabilities From IBM InternetNews:IBM Spruces Up Autonomic Computing Offerings cw360.com: IBM to demo autonomic technology at CeBIT
33
Control Theory Book Feedback Control of Computing Systems
Wiley-Interscience Intended audience
Computer scientist with minimal math background (geometric series) who want to apply techniques to practical problems
Control theorist looking for new applications Status
10 of 11 chapters at a “beta” level Expected completion by end of June Publication in 2004
34
Table of Contents1. Introduction (Qualitative control theory)
2. Model construction (statistics)
3. Z-Transforms and transfer functions (component models)
4. Block diagrams (system models)
5. First order systems
6. Higher order systems
7. State space models (multi-variate models)
8. Proportional control (feedback basics)
9. Other classical controllers (PID, tuning controllers)
10. State space feedback control (MIMO)
11. Advanced topics
35
Progress Towards Project Goals
Develop/identify a formal approach Control theory based
Demonstrate value Lotus Notes – control w/o instabilities Apache – simple way to optimize tuning parameters DB2 Utilities Throttling HotRod – handling resource actions
with dead times HotRod prototype – resource actions w/lead times
Evangelize Feedback Control of Computing Systems, Wiley-Interscience Tutorials: Almaden, Integrated Management, Stanford/Berkeley Classes: Columbia?, University of Michigan? AC toolkit integration
1. "Using Control Theory to Achieve Service Level Objectives in Performance Management," S Parekh, N Gandhi, JL Hellerstein, D Tilbury, TS Jayram, J Bigus, Real Time Systems Journal, 2002.
2. "Feedback Control of a Lotus Notes Server: Modeling and Control Design," N. Gandhi, S. Parekh, J. Hellerstein, and D.M. Tilbury, American Control Conference, 2001. (Best paper in session.)
3. "An Introduction to Control Theory With Applications to Computer Science," JL Hellerstein and S Parekh, ACM Sigmetrics, 2001.
4. Using MIMO Feedback Control to Enforce Policies for Interrelated Metrics With Application to the Apache Web Serve," Y Diao, N Gandhi, JL Hellerstein, S Parekh, and DM Tilbury. Network Operations and Management, 2002. (Best paper in conference.)
5. "MIMO Control of an Apache Web Server: Modeling and Controller Design," Y Diao, N Gandhi, JL Hellerstein, S Parekh, and DM Tilbury, American Control Conference, 2002. (Best paper in session.)
6. "Using Fuzzy Control to Maximize Profits in Service Level Management," Y Diao, JL Hellerstein, S Parekh. Accepted to the IBM Systems Journal, 2002.
7. "A First-Principles Approach to Constructing Transfer Functions for Admission Control in Computing Systems," JL Hellerstein, Y Diao, and S Parekh. Conference on Decision and Control, 2002.
8. "Generic On-Line Discovery of Quantitative Models for Service Level Management," Y Diao, F Eskesen, S Froehlich, JL Hellerstein, A Keller, L Spainhower, and M Surendra, IFIP Symposium on Integrated Management, 2003.
9. On-Line Response Time Optimization of An Apache Web Server," Yixin Diao, Xue Lui, Steve Froehlich, Joseph L Hellerstein, Sujay Parekh, and Lui Sha. To appear in International Workshop on Quality of Service, 2003.
http://www.research.ibm.com/PM