Autonomic Runtime System: Design and Evaluation for SAMR Applications *
description
Transcript of Autonomic Runtime System: Design and Evaluation for SAMR Applications *
-
Autonomic Runtime System: Design and Evaluationfor SAMR Applications*Salim HaririHigh Performance Distributed Computing LaboratoryThe University of Arizonahttp://www.ece.arizona.edu/~hpdc
Supported by: NSF, DOE, DARPA, Intel, Raytheon and AOL grants
-
OutlineMotivation and objectivesAutonomia: An Autonomic Control and Management EnvironmentSelf-OptimizationSelf-ProtectionConclusion Remarks
-
Information Technology and Biology ConvergenceOur system design methods and management tools seem to be inadequate for handling the complexity, size, and heterogeneity of today and future Information systemsBiological systems have evolved strategies to cope with dynamic, complex, highly uncertain constraints
-
Current Design and Development of Computing SystemsDifferent fields evolved separately and Targeted few domains/applications
-
New System Construction: Part to The Whole ApproachAdds ComplexityHigh-CostInteroperabilityIssues
-
Autonomic Computing System: Wholestic ApproachSelf-Healing ComponentSelf-Optimizing ComponentSelf-Configuring ComponentSelf-Protecting ComponentAutonomic Building BlockSecure , Fault-Tolerant SystemHigh-Performance, Fault-Tolerant SystemAutonomic Computing Systems
-
Autonomia: An Autonomic Control and ManagementProvide dynamically programmable control and management services to support the development and deployment of autonomic applicationsProvide Autonomic Runtime Services (self-healing, self-configuring, self-protecting, self-optimizing) Provide automated deployment, registration, discovery of autonomic componentsProvide automated configuration of autonomic applications and system resources
-
Application Management EditorUsers ApplicationEvent ServerCoordinatorMonitoring &Analysis EngineSchedulingEnginePlanningEngineAIK Repository ACA Specifications Policy Component State Resource StatePolicy EngineSelf ProtectingSelf OptimizingSelf HealingSelf ConfiguringAutonomic Runtime ServicesCRM: Component Runtime ManagerVEE: Virtual Execution EnvironmentVEE(App1)VEE(Appn) VEE(App2)Application Runtime Manager (ARM)Know-ledgeHigh Performance Computing Environment (HPCE)Autonomic Runtime SystemACAmACA2ACA3ACA3ACA1ACA2ACA3ACA1ACA2ACAjCRMACA1Computational Component
-
Application Management EditorUsers ApplicationEvent ServerCoordinatorMonitoring &Analysis EngineSchedulingEnginePlanningEngineAIK Repository ACA Specifications Policy Component State Resource StatePolicy EngineSelf ProtectingSelf OptimizingSelf HealingSelf ConfiguringAutonomic Runtime ServicesCRM: Component Runtime ManagerVEE: Virtual Execution EnvironmentVEE(App1)VEE(Appn) VEE(App2)Application Runtime Manager (ARM)Know-ledgeHigh Performance Computing Environment (HPCE)Autonomic Runtime SystemACAmACA2ACA3ACA3ACA1ACA2ACA3ACA1ACA2ACAjCRMACA1Computational Component1223344Autonomia Process Flow324
-
Application Execution
-
Self-Optimizing: Design and Evaluation
-
Current Implementations Intractable for Large Problems
-
Wildfire Autonomic Runtime Manager (WARM)Dynamic Data Driven Wildfire Model Analysis ObjectivesNatural Region CharacterizationActive Performance ModelKnowledge RepositoryAutonomic SchedulingHeterogeneous, Dynamic Computational EnvironmentNR1 BurnedNR2 BurningNR3 UnburnedActualPredictedSensorsSurvey FlightsGPSSatelliteWild Fire Model Development EnvironmentRegional WeatherTerrain CharacteristicLocal Weather Temp Humidity Wind Speed Wind Direction Clouds Precipitation LightningFire Behavior Location Intensity Geometry PropagationFuel ConditionsSmoke Locations and concentrationFirefighting ActivitiesExecutionNR2CPU
-
Forest Fire Cell Space: Dynamic Repartitioning Initial partitioningNR2 Burning zonefiner griddingBurned zonecoarser griddingNR2 NR3 NR3 NR5
-
Wild Fire Simulation PhysicsThe entire area is represented as a 2-D cell-space.The weather and vegetation conditions are assumed to be uniform within a cell, but may vary in the entire cell spaceWhen a cell is ignited, its state will change from unburned to burning. During its burning phase, the fire will propagate to its eight neighbors along the eight directions as shown below.
As the simulation time advances, the fire will propagate from the first ignition cell to other cells.
-
Parallel Wild Fire Simulation AnalysisThe composition of execution time at time step t for 4 processors.
To decrease T(t), make the computation time on each processor as even as possible, which minimizing the synchronization time. Imbalance Ratio (IR) characterizes the imbalance situation
Tcomp(1,t)
Tcomp(4,t)
Tcomp(3,t)
Tcomp(2,t)
T(t)
Tbcast(t)
-
Fire Simulation ExampleThe example above describes the imbalance ratio at different time steps. As the simulation advances, imbalance situation will get worse. t = 1t = Nt = 2N
P2
P4
P3
P1
P3
P2
P1
P4
P4
P3
P2
P1
-
Self-Optimization Monitors the state of fire simulation to obtain the computation load at any time stepMonitors the states of the underlying system to obtain the computation capacityMonitor the imbalance ratio at any time step. If the imbalance ratio is larger than a given threshold, dynamically adjust the workload among processors at run time.
-
Self-Optimization AlgorithmObtain the total workload at time t
Estimate the computation time of one burning cell on processor p with the consideration of system load
Where L(p,t) is the length of CPU queue on processor p at time tCalculate the average execution time of one burning cell
-
Self-Optimization Algorithm(contd)To balance the load on each processor, processor allocation factor (PAF) is defined as inversely proportional to the processor execution time with respect to the average execution time.
Calculate the Processor Load Ratio (PLR) that characterize the capacities of processors
Note that:
Calculate the workload assigned to processor p at time step t, workload(p,t)
-
Fire Simulation Example with Self-Optimization AlgorithmWith the self-optimization algorithm, the imbalance situation will be dramatically decreased. t = 1t = 2Nt = N
P2
P4
P3
P1
P1
P2
P3
P4
P1
P2
P3
P4
-
Wildfire Autonomic Runtime Manager
-
Experimental resultsProblem size is 64K and number processors is 8With self-optimization, the imbalance ratio will be controlled as close to the threshold. But without self-optimization, the imbalance ration will get larger as the simulation advances
Chart2
3.17799824036.6191302716
28.469524040824.6476018212
40.817131244535.3582164329
50.236167853849.5441343212
62.025393261869.3019698436
82.134066668762.5759227535
113.1929215421.7320261438
141.57359711918.9119874074
169.725898494425.4154514007
193.566168096635.0316527094
221.66879523242.5348488667
245.356833811559.2554181625
266.477359006321.288881293
284.445323554919.6008070849
305.712504561519.4798517278
322.233720381730.6573881436
357.645520345539.2493631068
353.359204462844.9904443383
365.323992994751.9426914956
373.003415559860.3515110909
387.556736242919.4364400021
Static Partition without Self-Optimization
Dynamic Partition with Self-Optimization
Time Step
Imbalance Ratio (%)
Imbalance Ratio Comparison with threshold = 50%
Sheet1
3.17799824036.6191302716106.61913027163.17799824031
17.362092814710.57934508822024.647601821228.469524040850
28.469524040824.64760182123035.358216432940.8171312445100
35.538522371335.35821643294049.544134321250.2361678538150
40.817131244543.56118591655069.301969843662.0253932618200
46.104250735849.54413432126062.575922753582.1340666687250
50.236167853869.30196984367021.7320261438113.19292154300
54.663850221958.87789088528018.9119874074141.573597119350
58.445032065962.57592275359025.4154514007169.7258984944400
62.025393261869.455837690610035.0316527094193.5661680966450
71.726420967921.732026143811042.5348488667221.668795232500
82.134066668718.911987407412059.2554181625245.3568338115550
90.299869225425.415451400713021.288881293266.4773590063600
95.251375479326.695793052214019.6008070849284.4453235549650
102.847706310335.031652709415019.4798517278305.7125045615700
113.1929215442.534848866716030.6573881436322.2337203817750
120.427232258251.342539825617039.2493631068357.6455203455800
126.415294581459.255418162518044.9904443383353.3592044628850
141.57359711966.399575484219051.9426914956365.3239929947900
148.726100117121.28888129320060.3515110909373.0034155598950
153.144725834719.600807084921019.4364400021387.55673624291000
165.081515534419.4798517278220
169.725898494424.1647269472230
176.596456543230.6573881436240
179.596586501234.5887171135250
184.061072409739.2493631068260
191.514001669644.9904443383270
193.566168096649.3865067323280
200.432228411451.9426914956290
203.347515575155.4908943167300
207.77705177160.3515110909310
218.622957808742.69378886320
221.66879523219.4364400021330
226.740773406725.6367498672340
227.831012437212.1391715192350
234.228575773717.8302755616360
240.125306041823.0517496712370
245.356833811529.7104589731380
247.94501822635.2249091981390
252.89629280640.3009307135400
257.766923814343.3210606875410
262.090936739748.875503255420
266.477359006351.3531127801430
264.010817873856.2759491836440
270.188742325759.2874898147450
275.77630297139.3689851681460
277.697131664114.644166172470
284.445323554911.3483073098480
287.874644913316.5413882702490
290.705492209122.2725837312500
295.355378183227.8175279899510
299.173880631131.3308550186520
305.712504561537.1662576231530
309.561474225441.6592328278540
314.159992691645.8944527178550
317.373970997748.973201011560
319.987850254454.4166132642570
322.233720381757.7777365078580
330.815944477462.4847690004590
329.814533292865.0969323331600
335.30638971811.6217923546610
357.645520345515.2062844909620
342.865191637112.5200155068630
343.977761237117.4136435232640
347.964784770520.4010139033650
350.816013602824.7858640816660
353.359204462829.2851055931670
353.48134883331.6746968636680
358.212692342734.6677011039690
358.802233073237.9837505379700
361.95855442239.8202435951710
365.323992994743.6260025327720
365.29426047945.605010818730
369.523664586848.9214032721740
368.859064469251.8747678698750
370.852338381154.2784486098760
373.003415559857.2803111294770
373.865566617.5852556702780
376.768645357711.3559269528790
376.66408492418.9142892164800
379.023516373412.4123663947810
387.556736242915.2569285172820
385.251388149419.2035262553830
390.83745786921.7646112477840
392.7026453125.4557436341850
395.891223958728.3583004013860
397.025401069530.4545941889870
397.265785118933.4216183019880
402.915190316236.1458620795890
402.122003827838.3397665656900
406.127544262940.7245854638910
406.993123757243.8072920256920
411.302760666245.8973868462930
411.847807904648.5486528954940
413.469263542351.2544747917950
417.162909400353.9671582952960
419.6759224047.931138544970
421.72474266119.0461728812980
424.525833878416.997322042990
430.711359517512.31713468981000
Sheet1
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
Static Partition without Self-Optimization
Dynamic Partition with Self-Optimization
Time Step
Time Difference Percentage (%)
Maximum and Minimum Time Difference Percentage Comparison
Sheet2
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
Static Partition without Self-Optimization
Dynamic Partition with Self-Optimization
Time Step
Imbalance Ratio (%)
Imbalance Ratios Comparison with threshold = 50%
Sheet3
0.0663300 timestep0.118836600timestep0.155819
0.06620.1399910.239581
0.0662210.0659560.103234
0.0660320.0660330.065889
0.0660890.065930.065992
0.0659220.0656640.065817
0.0661760.0659590.066266
0.0660940.0660230.065907
0.0663840.0776620.088018
0.066380.0886550.086416
0.0663890.0799810.099355
0.066310.0731660.088213
0.0663990.0730560.082163
0.0661870.0728280.081916
0.0639510.0731960.08226
0.0668380.0729940.082007
Sheet3
000
000
000
000
000
000
000
000
Time Step 1
Time Step 300
Time Step 600
Processor Number
Execution Time(s)
Executition Time Distribution of Processors without Self-Optimization
000
000
000
000
000
000
000
000
Time Step 1
Time Setp 300
Time Step 600
Processor Number
Execution Time(s)
Executition Time Distribution of Processors with Self-Optimization
-
Experimental results (contd)Problem size is 64K and number processors is 8.Without self-optimization, the execution times of processors for one time step will be heterogeneous as the simulation advances.With self-optimization, the execution times of processors for one time step will be almost evenly distributed as the simulation advances.
Chart5
0.0663840.0776620.088018
0.066380.0886550.086416
0.0663890.0799810.099355
0.066310.0731660.088213
0.0663990.0730560.082163
0.0661870.0728280.081916
0.0639510.0731960.08226
0.0668380.0729940.082007
Time Step 1
Time Setp 300
Time Step 600
Processor Number
Execution Time(s)
Executition Time Distribution of Processors with Self-Optimization
Sheet1
3.17799824036.6191302716106.61913027163.17799824031
17.362092814710.57934508822024.647601821228.469524040850
28.469524040824.64760182123035.358216432940.8171312445100
35.538522371335.35821643294049.544134321250.2361678538150
40.817131244543.56118591655069.301969843662.0253932618200
46.104250735849.54413432126062.575922753582.1340666687250
50.236167853869.30196984367021.7320261438113.19292154300
54.663850221958.87789088528018.9119874074141.573597119350
58.445032065962.57592275359025.4154514007169.7258984944400
62.025393261869.455837690610035.0316527094193.5661680966450
71.726420967921.732026143811042.5348488667221.668795232500
82.134066668718.911987407412059.2554181625245.3568338115550
90.299869225425.415451400713021.288881293266.4773590063600
95.251375479326.695793052214019.6008070849284.4453235549650
102.847706310335.031652709415019.4798517278305.7125045615700
113.1929215442.534848866716030.6573881436322.2337203817750
120.427232258251.342539825617039.2493631068357.6455203455800
126.415294581459.255418162518044.9904443383353.3592044628850
141.57359711966.399575484219051.9426914956365.3239929947900
148.726100117121.28888129320060.3515110909373.0034155598950
153.144725834719.600807084921019.4364400021387.55673624291000
165.081515534419.4798517278220
169.725898494424.1647269472230
176.596456543230.6573881436240
179.596586501234.5887171135250
184.061072409739.2493631068260
191.514001669644.9904443383270
193.566168096649.3865067323280
200.432228411451.9426914956290
203.347515575155.4908943167300
207.77705177160.3515110909310
218.622957808742.69378886320
221.66879523219.4364400021330
226.740773406725.6367498672340
227.831012437212.1391715192350
234.228575773717.8302755616360
240.125306041823.0517496712370
245.356833811529.7104589731380
247.94501822635.2249091981390
252.89629280640.3009307135400
257.766923814343.3210606875410
262.090936739748.875503255420
266.477359006351.3531127801430
264.010817873856.2759491836440
270.188742325759.2874898147450
275.77630297139.3689851681460
277.697131664114.644166172470
284.445323554911.3483073098480
287.874644913316.5413882702490
290.705492209122.2725837312500
295.355378183227.8175279899510
299.173880631131.3308550186520
305.712504561537.1662576231530
309.561474225441.6592328278540
314.159992691645.8944527178550
317.373970997748.973201011560
319.987850254454.4166132642570
322.233720381757.7777365078580
330.815944477462.4847690004590
329.814533292865.0969323331600
335.30638971811.6217923546610
357.645520345515.2062844909620
342.865191637112.5200155068630
343.977761237117.4136435232640
347.964784770520.4010139033650
350.816013602824.7858640816660
353.359204462829.2851055931670
353.48134883331.6746968636680
358.212692342734.6677011039690
358.802233073237.9837505379700
361.95855442239.8202435951710
365.323992994743.6260025327720
365.29426047945.605010818730
369.523664586848.9214032721740
368.859064469251.8747678698750
370.852338381154.2784486098760
373.003415559857.2803111294770
373.865566617.5852556702780
376.768645357711.3559269528790
376.66408492418.9142892164800
379.023516373412.4123663947810
387.556736242915.2569285172820
385.251388149419.2035262553830
390.83745786921.7646112477840
392.7026453125.4557436341850
395.891223958728.3583004013860
397.025401069530.4545941889870
397.265785118933.4216183019880
402.915190316236.1458620795890
402.122003827838.3397665656900
406.127544262940.7245854638910
406.993123757243.8072920256920
411.302760666245.8973868462930
411.847807904648.5486528954940
413.469263542351.2544747917950
417.162909400353.9671582952960
419.6759224047.931138544970
421.72474266119.0461728812980
424.525833878416.997322042990
430.711359517512.31713468981000
Sheet1
Static Partition without Self-Optimization
Dynamic Partition with Self-Optimization
Time Step
Time Difference Percentage (%)
Maximum and Minimum Time Difference Percentage Comparison
Sheet2
Static Partition without Self-Optimization
Dynamic Partition with Self-Optimization
Time Step
Time Difference Percentage (%)
Maximum and Minimum Time Difference Percentage Comparison
Sheet3
0.0663300 timestep0.118836600timestep0.155819
0.06620.1399910.239581
0.0662210.0659560.103234
0.0660320.0660330.065889
0.0660890.065930.065992
0.0659220.0656640.065817
0.0661760.0659590.066266
0.0660940.0660230.065907
0.0663840.0776620.088018
0.066380.0886550.086416
0.0663890.0799810.099355
0.066310.0731660.088213
0.0663990.0730560.082163
0.0661870.0728280.081916
0.0639510.0731960.08226
0.0668380.0729940.082007
Sheet3
0
0
0
0
0
0
0
0
Time Step 1
Time Step 300
Time Step 600
Processor Number
Execution Time(s)
Executition Time Distribution of Processors without Self-Optimization
Time Step 1
Time Setp 300
Time Step 600
Processor Number
Execution Time(s)
Executition Time Distribution of Processors with Self-Optimization
Chart5
0.06630.1188360.155819
0.06620.1399910.239581
0.0662210.0659560.103234
0.0660320.0660330.065889
0.0660890.065930.065992
0.0659220.0656640.065817
0.0661760.0659590.066266
0.0660940.0660230.065907
Time Step 1
Time Step 300
Time Step 600
Processor Number
Execution Time(s)
Executition Time Distribution of Processors without Self-Optimization
Sheet1
3.17799824036.6191302716106.61913027163.17799824031
17.362092814710.57934508822024.647601821228.469524040850
28.469524040824.64760182123035.358216432940.8171312445100
35.538522371335.35821643294049.544134321250.2361678538150
40.817131244543.56118591655069.301969843662.0253932618200
46.104250735849.54413432126062.575922753582.1340666687250
50.236167853869.30196984367021.7320261438113.19292154300
54.663850221958.87789088528018.9119874074141.573597119350
58.445032065962.57592275359025.4154514007169.7258984944400
62.025393261869.455837690610035.0316527094193.5661680966450
71.726420967921.732026143811042.5348488667221.668795232500
82.134066668718.911987407412059.2554181625245.3568338115550
90.299869225425.415451400713021.288881293266.4773590063600
95.251375479326.695793052214019.6008070849284.4453235549650
102.847706310335.031652709415019.4798517278305.7125045615700
113.1929215442.534848866716030.6573881436322.2337203817750
120.427232258251.342539825617039.2493631068357.6455203455800
126.415294581459.255418162518044.9904443383353.3592044628850
141.57359711966.399575484219051.9426914956365.3239929947900
148.726100117121.28888129320060.3515110909373.0034155598950
153.144725834719.600807084921019.4364400021387.55673624291000
165.081515534419.4798517278220
169.725898494424.1647269472230
176.596456543230.6573881436240
179.596586501234.5887171135250
184.061072409739.2493631068260
191.514001669644.9904443383270
193.566168096649.3865067323280
200.432228411451.9426914956290
203.347515575155.4908943167300
207.77705177160.3515110909310
218.622957808742.69378886320
221.66879523219.4364400021330
226.740773406725.6367498672340
227.831012437212.1391715192350
234.228575773717.8302755616360
240.125306041823.0517496712370
245.356833811529.7104589731380
247.94501822635.2249091981390
252.89629280640.3009307135400
257.766923814343.3210606875410
262.090936739748.875503255420
266.477359006351.3531127801430
264.010817873856.2759491836440
270.188742325759.2874898147450
275.77630297139.3689851681460
277.697131664114.644166172470
284.445323554911.3483073098480
287.874644913316.5413882702490
290.705492209122.2725837312500
295.355378183227.8175279899510
299.173880631131.3308550186520
305.712504561537.1662576231530
309.561474225441.6592328278540
314.159992691645.8944527178550
317.373970997748.973201011560
319.987850254454.4166132642570
322.233720381757.7777365078580
330.815944477462.4847690004590
329.814533292865.0969323331600
335.30638971811.6217923546610
357.645520345515.2062844909620
342.865191637112.5200155068630
343.977761237117.4136435232640
347.964784770520.4010139033650
350.816013602824.7858640816660
353.359204462829.2851055931670
353.48134883331.6746968636680
358.212692342734.6677011039690
358.802233073237.9837505379700
361.95855442239.8202435951710
365.323992994743.6260025327720
365.29426047945.605010818730
369.523664586848.9214032721740
368.859064469251.8747678698750
370.852338381154.2784486098760
373.003415559857.2803111294770
373.865566617.5852556702780
376.768645357711.3559269528790
376.66408492418.9142892164800
379.023516373412.4123663947810
387.556736242915.2569285172820
385.251388149419.2035262553830
390.83745786921.7646112477840
392.7026453125.4557436341850
395.891223958728.3583004013860
397.025401069530.4545941889870
397.265785118933.4216183019880
402.915190316236.1458620795890
402.122003827838.3397665656900
406.127544262940.7245854638910
406.993123757243.8072920256920
411.302760666245.8973868462930
411.847807904648.5486528954940
413.469263542351.2544747917950
417.162909400353.9671582952960
419.6759224047.931138544970
421.72474266119.0461728812980
424.525833878416.997322042990
430.711359517512.31713468981000
Sheet1
Static Partition without Self-Optimization
Dynamic Partition with Self-Optimization
Time Step
Time Difference Percentage (%)
Maximum and Minimum Time Difference Percentage Comparison
Sheet2
Static Partition without Self-Optimization
Dynamic Partition with Self-Optimization
Time Step
Imbalance Ratio (%)
Imbalance Ratio Comparison with threshold = 50%
Sheet3
0.0663300 timestep0.118836600timestep0.155819
0.06620.1399910.239581
0.0662210.0659560.103234
0.0660320.0660330.065889
0.0660890.065930.065992
0.0659220.0656640.065817
0.0661760.0659590.066266
0.0660940.0660230.065907
0.0663840.0776620.088018
0.066380.0886550.086416
0.0663890.0799810.099355
0.066310.0731660.088213
0.0663990.0730560.082163
0.0661870.0728280.081916
0.0639510.0731960.08226
0.0668380.0729940.082007
Sheet3
Time Step 1
Time Step 300
Time Step 600
Processor Number
Execution Time(s)
Executition Time Distribution of Processors without Self-Optimization
Time Step 1
Time Setp 300
Time Step 600
Processor Number
Execution Time(s)
Executition Time Distribution of Processors with Self-Optimization
-
Experimental results (contd)Problem size (256*256 = 64K)
Problem size (512*512 = 256K)
-
Memory-based Proactive Runtime PartitioningOptimize performance using memory-based approachminimize number of page faults and balance work among processorsMemory function model for RM3D
W is application workload, ai are PF-based heuristicsMemory-based processor grouping and workload partitioningLightly (X -), moderately (X), or heavily (X +) loaded groups based on 2-level threshold with N -, N, and N + processors respectivelyWork in group X - transferred to X + with unit of work being Sort processors in X + in ascending order of available memoryChecks are made for processors with corresponding least available memoryThreshold conditions for work transfers must be metAfter work transfers, new memory-based work partitioning ratios are computed as
-
Memory-based Proactive Runtime PartitioningBetter performance moderately, heavily loaded scenariosMost processors have less available memoryFrequent page faults resulting in long application delaysMemory-based algorithm yields better performance
Memory-based proactive adaptation performance gain for RM3D application with base grid size 128*32*32 on 8 processors
-
CPU-based Proactive Runtime PartitioningAdaptive system sensitive partitioner uses system capacities and obtained performance function to compute the relative computational capacities of each processorSystem Capacity CalculationN processors, the total work to be assigned is L
Runtime monitors application and system stateApplication state: level of refinement, number, shape and aspect ratio of refined patchesSystem state: computational load, memory availability, link bandwidthPerformance engine selects the appropriate performance function to predict the execution time of the application for next time step is the execution time on processor k
The PF of RM3D on processor k for a given load X1 and AMR level X2 is empirically defined as:
-
CPU Based Proactive System Sensitive Runtime PartitioningCPU-based proactive partitioning performance gain on 16 processors. (Base grid size: 641616)
ScenariosExecution time w/o CPU adaptation (seconds)Execution time with CPU adaptation (seconds)Percentage ImprovementLightly loaded2126.06727.1765.8%Moderately loaded2301.151641.7328.66%Heavily loaded2378.251624.1531.71%
-
Autonomia Self-HealingAPPLICATION RUNTIME MANAGER
Autonomic Middleware ServicesSELF-HEALING SERVICE
AUTONOMIC RUNTIME SYSTEMComponent FAult Manager
Heterogeneous EnvironmentAIKUser applicationApplication Management Editor
-
Self-Healing Engine
-
Self-Protection Methodology
Chart1
0.0260.02631579
0.0394736830.039473683
0.0394736830.039473683
0.0789473650.078947365
0.026315790.02631579
0.052631580.05263158
0.0131578950.013157895
0.0657894760.065789476
0.0131578950.013157895
0.44736840.013157895
0.934210540.02631579
0.98684210.065789476
0.98684210.039473683
0.98684210.02631579
0.98684210
0.98684210.05263158
0.98684210.039473683
0.98684210.039473683
0.98684210.039473683
0.98684210.02631579
0.98684210.065789476
0.98684210.039473683
0.98684210.05263158
0.98684210.078947365
0.98684210.02631579
0.98684210.02631579
0.98684210.05263158
0.98684210.013157895
0.98684210.09210526
0.98684210.013157895
0.98684210.0013386881
0.98684210.065789476
0.98684210.05263158
0.921052630.039473683
0.552631560.02631579
0.131578950.02631579
0.0657894760.039473683
0.0394736830.013157895
0.026315790.05263158
0.052631580.05263158
0.0131578950.0010131713
0.052631580.02631579
0.0394736830.05263158
0.052631580.039473683
0.0394736830.0009157509
0.026315790.02631579
Node VI (Under Attack)
Node VI (No Attackl)
Sheet2
Simulation TimeNode VI (Under Attack)Node VI (No Attackl)SYN Received ( Under Attack)SYNs Received (No Attackl)Connection Queue ( Under Attack)Connection Queue (No Attack)Requests Processed (Under Attack)Requests Processed (No Attack)Flow VI ( Under Attack)Flow VI (No Attack)Attack SYNTrust SYNLegitimate Flows (Under Attack)Legtimate Flows (No Attack)Total Flows (Attack)Total Flows (Normal)
120.0260.0263157954542251510.042553190.04255319113136060
150.0394736830.03947368379793375750.0416666680.0416666680212126060
180.0394736830.03947368310210233102102000014146060
210.0789473650.078947365130130661231230.1041666640.1041666643212126060
240.026315790.02631579149149221471470.06250.06251212126060
270.052631580.05263158168168441631630.0833333360.0833333362212126060
300.0131578950.013157895187187111851850.042553190.042553191113136060
330.0657894760.06578947621021055210210000014146060
360.0131578950.013157895227227112252250.081632650.081632651311116060
390.44736840.0131578952892563412552550.430379750.021739131014149360
420.934210540.026315793482797122772770.614035070.0638297921121312660
450.98684210.0657894764123137552913070.73154360.1041666642381015758
480.98684210.0394736834813407532923370.802139040.021739131051419260
510.98684210.026315795393657522923620.841628970.063829792141322560
540.986842106033837502923830.8661417400041425860
570.98684210.052631586704137542924090.88771930.086956522241228958
600.98684210.0394736837394347532924320.89968650.0227272731031432258
630.98684210.0394736838044597532924570.909090940.0222222230131335558
660.98684210.0394736838604797532924780.90615830.066666672131334458
690.98684210.026315798904917522924890.89225590.0227272731031430058
720.98684210.0657894769335177552925110.87351780.108695653231225658
750.98684210.0394736839745387532925360.87351780.063829790331125658
780.98684210.0526315810125597542925550.846890.10204082053921258
810.98684210.07894736510505857562925790.80606060.0222222230131316858
840.98684210.0263157910766067522926040.735537200031412458
870.98684210.0263157911106247522926210.73553720.066666672131312458
900.98684210.0526315811506517542926460.584415560.045454547203148058
930.98684210.01315789511916797512926770.41818180.022727273103145858
960.98684210.0921052612227057572926980.41818180.022222223013135858
990.98684210.01315789512657287512927260.41818180.08695652223125858
1020.98684210.001338688113037477502927460.41818180.022727273103145858
1050.98684210.06578947613407727552927700.41818180.06521739123125858
1080.98684210.0526315813777927542927910.41818180003145858
1110.921052630.03947368314128187032928140.40740740.045454547204145858
1140.552631560.0263157914408364223178330.145833330.085106381310115858
1170.131578950.0263157914668631023458610.0222222230.0227272731013145858
1200.0657894760.03947368314888905337088600.088888893114135858
1230.0394736830.0131578951512913313929110.066666670.0833333360413115859
1260.026315790.05263158154894424428942000014146060
1290.052631580.052631581574970444519690.102040820.021739131011146060
1320.0131578950.00101317131591987104719860.061224490.021739131011146060
1350.052631580.02631579161710154249410120.102040820.0212765950111136060
1380.0394736830.05263158164010423451810370.063829790.0416666680213126060
1410.052631580.039473683166110664354010630.043478260.06250314106058
1440.0394736839.16E-041679109230560109100.043478261114126058
1470.026315790.026315791698111522578111200.021739130114126058
Sheet2
Node VI (Under Attack)
Node VI (No Attackl)
SYN Received ( Under Attack)
SYNs Received (No Attackl)
Connection Queue ( Under Attack)
Connection Queue (No Attack)
Requests Processed (Under Attack)
Requests Processed (No Attack)
Flow VI ( Under Attack)
Flow VI (No Attack)
Total Flows (Attack)
Total Flows (Normal)
-
Measurement Attributes for Different ProtocolsInside a network element, the measurement attributes can be monitored at different protocol layers.
During the attack (DoS attack, SQL slammer worm, email worm, etc.), significant behaviors will be observed.
-
Illustrative Network Example100 Mbps, router to router links. Router to client node links are 30 Mbps and 10 Mbps
-
Abnormality Distance (AD)Abnormality Distance of measurement attributes is used as an abnormality metric for profile modeling of the component behavior.
where and are the mean and variance under the normal operation condition corresponding to the online measurement of attribute k.
Right figure shows the ADtcp_out based on the single measurement attribute measure wherethe larger magnitude of the ADtcp_out indicatesthe abnormal behavior that might be due toan attack.
-
Multivariate Analysis Techniques on Network Attack DetectionMeasurement AttributestcpOut: legitimate outgoing TCP segments ratetcpTotal: legitimate outgoing and spoofed outgoing TCP segments rateNRC: Normal Region Center, which is the baseline profile for the normal stateAD: Abnormality DistanceUCLtcpoutLCLtcpoutUCLtcptotaltcpOutAtcpTotalLCLtcptotalNRCADNormal Region
-
Validation on Attacker Side Spoofed TCP SYN AttackAttack intensity and duration are adjustableTCP SYN attack traffic is spoofedNumber of incoming/outgoing packets only wont detect the attack existenceJointly with the total TCP network activity analysis can reveal the attack.
Chart1
0
0
0
0
0
0
0
0
0
0
5
3
0
0
2
4
0
0
0
0
0
0
0
0
0
0
0
0
2
0
0
1
0
2
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
4
6
0
0
0
0
0
0
0
0
0
2
0
0
3
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
3
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
time (1.0s)
number of packets
TCP traffic legitimate in direction
tcpin
0
0
0
0
0
0
0
0
0
0
5
3
0
0
2
4
0
0
0
0
0
0
0
0
0
0
0
0
2
0
0
1
0
2
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
4
6
0
0
0
0
0
0
0
0
0
2
0
0
3
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
3
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
tcpin
time (1.0s)
number of packets
TCP traffic legitimate in direction
Chart1
0
0
0
0
0
0
0
0
0
0
6
27
4
0
6
23
25
5
0
0
0
0
0
0
0
0
0
0
4
0
0
1
0
4
1
3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
3
0
0
0
0
0
0
3
0
2
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
5
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
7
15
29
4
0
0
0
0
0
0
0
3
0
0
11
11
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
5
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
time (1.0s)
number of packets
TCP traffic legitimate out direction
tcpout
0
0
0
0
0
0
0
0
0
0
6
27
4
0
6
23
25
5
0
0
0
0
0
0
0
0
0
0
4
0
0
1
0
4
1
3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
3
0
0
0
0
0
0
3
0
2
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
5
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
7
15
29
4
0
0
0
0
0
0
0
3
0
0
11
11
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
5
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
tcpout
time (1.0s)
number of packets
TCP traffic legitimate out direction
Chart2
0
0
0
0
0
0
0
0
0
0
11
30
4
0
8
27
25
5
0
0
0
0
0
0
0
0
0
0
6
0
0
2
0
6
1
4
0
0
0
0
3297
4458
0
5
2246
7204
6862
5056
6855
5998
7271
6527
6589
6504
5746
4475
4726
4895
6598
6995
7492
6827
5239
6761
5931
5921
6207
3774
0
0
5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
5
0
0
0
0
0
0
5
0
2
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
8
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
11
21
29
4
0
0
0
0
0
0
0
5
0
0
15
11
0
0
0
0
0
0
0
0
0
0
0
2
0
0
0
8
0
0
0
0
0
0
3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
time (1.0s)
number of packets
TCP traffic total activity
tcpin
0
0
0
0
0
0
0
0
0
0
5
3
0
0
2
4
0
0
0
0
0
0
0
0
0
0
0
0
2
0
0
1
0
2
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
4
6
0
0
0
0
0
0
0
0
0
2
0
0
3
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
3
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
tcpin
time (1.0s)
number of packets
TCP traffic legitimate in direction
-
Autonomia Self-Protection ArchitectureRaw Traffic w.r.t. metric 1
Information TheoryAutonomic RuntimeEngineOnlineMonitoringPolicy TranslatorChange Network Topology Abnormality functionw.r.t metrics 1 .. mRaw Traffic w.r.t. metric 2Raw Traffic w.r.t. metric nNormal/AbnormalCharacterizationChange Network Configuration Parameters Analysis Engine
-
Working Flow of the Analysis EngineInformation theory is used to identify the most important features that can be extracted from network data.
Genetic algorithm is used to train data and obtain the threshold and coefficients used by the linear rule for detection.
Threshold and coefficients are used to detect a wide range of attacks in the period of testing.
-
Network Attack Feature ExtractionTotal DatasetDoS + NormalU2R+NormalR2L + NormalProbe + NormalDiscrete Features Base dataset has a larger sample size Discrete feature provides little semantics information
Feature(X)I(X;Y)Is_hot_login0Land0Root_shell0Su_attempt0Is_guest_login0.006Flag0.062Protocol_type0.304Logged_in0.381service0.571
Feature(X)I(X;Y)Is_guest_login0Is_hot_login0Su_attempt0Land0Logged_in5.2e-5Protocol_type7.3e-5Flag0.0001Root_shell0.003service0.003
Feature(X)I(X;Y)Is_hot_login0Land0Su_attempt2.8e-5Root_shell0.0002Logged_in0.0021Flag0.0033Protocol_type0.0039Is_guest_login0.0144service0.0505
Feature(X)I(X;Y)Is_hot_login0Land0Su_attempt7e-06Root_shell1.4e-5Is_guest_login0.0022Protocol_type0.0386Logged_in0.0701Flag0.0807service0.1243
-
Network Attack Feature Extraction (Cont.)Discrete Features on Total DatasetContinuous Features on Total DatasetContinuous Features
Compared with the discrete features, some continuous features will provide more information to the final detection
Information provided by the continuous features is much more meaningful
Partition strategy is deployed in the discretization of the continuous features
Heuristic algorithms (e.g. Genetic Algorithm) is used to determine the optimal partition
Combining both discrete and continuous features will provide better detection rate
-
Experimental ResultsWe compare our approach that is based on discrete features with fuzzy classifier evolved using Ctree and those of the winner group in the KDDCup99 contest.
ClassOur ApproachCtreeWinner EntryNormal98.34%92.78%99.5%Dos99.33%98.91%97.1%U2R63.64%88.13%13.2%R2L5.86%7.41%8.4%PROBE93.95%50.35%83.3%
-
Results Discrete vs. Cont. & CombinedWe compare the results of using discrete and continuous features respectively
ClassResults using Discrete FeaturesResults using Continuous FeaturesNormal98.34%98.45% 99.98%Dos99.33%99.93% 99.98%U2R63.64%75.34% 98%R2L5.86%41.34% 80%PROBE93.95%99.91%
-
Summary and Concluding RemarksIncreased complexity, heterogeneity, uncertainty, and scale require new paradigms to design, control and manage systems and applicationsSystems and Applications need to operate reliably, securely, efficiently and cost-effectivelyNeed Wholestic Approach that can dynamically integrate and address all these issues simultaneously at the layers of the system and application hierarchyAutonomic Computing Provides an interesting, pragmatic approach to address these issuesMany challenges are ahead including composing and analyzing in real-time the operations and states of systems and applicationsneed new bio-inspired metrics that accurately characterize and quantify the system and application normal and abnormal states
Rutgeres - 03/26/97 Manish Parashar Rutgeres - 03/26/97The main modules include Application Management Editor (AME), Autonomic Middleware Services (AMS) and Application Delegated Manager. AME enables users to develop network centric applications with the ability to specify the control and management policies associated with each application task. AMS provides common middleware services and tools needed by applications and systems to operate autonomically. The main focus of ADM is on setting up the application execution environment and then maintaining its requirements at runtime. AME - a graphical user interface for developing an application with pre-developed tasks and specifying management requirements to configure and manage the execution of the application- Main functions offered by the editor are controlling the application editor workplace- storing the application management requirements in the Application Service Template in the Application Information and Knowledge (AIK) repository.- provides menu-driven task libraries that are grouped in terms of their functionality.- A user can develop an application with selected tasks and decide the attributes of the component and relationship between the tasks.- Each task creates a management specification window, which includes management requirement fields such as dependencies among tasks, monitoring parameters, and fault tolerance strategy.- We will see the design and implementation of AME later.
AMS
component repository- automatically register tasks/components that can be used to build complex applications
resource repository- automatically register the resources that can be used to run network centric applications
event server- automatically register and process any monitored events- receives status of any monitored attributes and then notifies the corresponding engines that subscribe to these events once they become true
application information and knowledge repository (AIK)- store application and task service templates
monitoring service- provide the workload information and the component state running on each resource
In Autonomia environment, application development and composition utilizes the AME services. Once that is done, the remaining activities include setting up the application execution environment and running the application.These two activities utilize the services offered by the autonomic middleware service (e.g., self-configuration, self-optimization, self-healing; etc.) and the Application Delegated Manager
ADM- The main function is setting up the application execution environment and then maintaining its requirements at runtime. - AMS assigns one ADM to manage one or several application attributes (performance, fault, security, etc.).- Then, For each application task, the ADM launches an appropriate Task Agent (TA) to monitor and manage the application task execution- I am using task and component interchangeably here- TA monitors the task execution using appropriate task sensors and intervenes using the task actuator whenever the task execution on the assigned machine can not meet its requirements- Another main activities of ADM include allocating the appropriate resources to run the application and maintaining the application requirements at runtime.- The next step is to build the application execution environment, it uses the AMS self-configuration engine and then run the application.- ADM uses the AMS autonomic runtime engines (self healing, self optimization, and self protection) to maintain the requirements at runtime.Manish Parashar