Improving Computing Systems Automatic Multi-Objective...

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEG RATED CIRCUITS AND SYSTEMS, ISSN: 0278-0070, DOI 10.1109/TCAD.2015.2501299, online published 2015, Rev.2 (Draft),

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7329996&filter%3DAND%28p_IS_Number%3A6917053%29

Copyright (c) 2015 IEEE.

1

Abstract—In search and optimization problems, one way to cope with a huge design space determined by multiple parameters is to use methods for Automatic Design Space Exploration (ADSE). FADSE is one of the frameworks that uses such a solution in the context of multi-objective computer architecture optimization. In the last couple of decades many different meta-heuristics have been developed and applied to optimize various problems. According to the “no free lunch” theorem, no single algorithm exists that can solve all problems better than all other algorithms. This article presents the extension of FADSE using a meta-optimization approach, which is used to improve the performance of design space exploration algorithms, by driving two different multi-objectiv e meta-heuristics concurrently. More precisely, we selected two genetic multi-objective algorithms: NSGA-II and SPEA2, that work together in order to improve both the solutions’ quality and the convergence speed. With the proposed improvements, we ran FADSE in order to optimize the hardware parameters’ values of the Grid ALU Processor (GAP) micro-architecture from a bi-objective point of view (performance and hardware complexity). Using our developed approach we obtained better GAP instances (a configuration has for almost the same CPI – 1.00, the hardware complexity 38% smaller/better – 35.81 vs 58.61) in half of the time compared to a classical sequential optimization approach (5 days vs. 10 days).

Index Terms— Design Space Exploration, Multi-objective Optimization Algorithms, Meta-Optimization, Grid AL U Processor

I. INTRODUCTION

s long as Moore’s law is still valid, there will be opportunities to build more complex and effective

processor architectures. The optimization of these architectures for various performance objectives will always be a challenging task. It is usually a multi-objective problem, most of the times with conflicting objectives, where adverse interactions are inevitable. This increases the complexity and, thus, finding good solutions to an NP-hard problem becomes even “harder”. Recent studies have proved Automatic Design Space Exploration (ADSE) to be a convenient and efficient tool for this problem. The main goal of ADSE is to find in a feasible amount of time, using different heuristics and meta-heuristics, an optimal approximation of the Pareto front (in fact, in the general case it represents a hyper-surface), which consists of the best configurations in regard to multiple objectives.

A number of ADSE tools have been developed for the possible space exploration of processor architectures. They include FADSE [1], Archexplorer [9], Multicube Explorer [10], Magellan [11], NASA [12], Metropolis [13], MILAN [14], EXPO [15], FAMA [16], SPLOT [17] and many more. Some of these tools use standard multi-objective heuristic methods, belonging to the class of genetic or swarm algorithms, like NSGA-II [18], SPEA2 [19], PAES [20], PESA-II [21], OMOPSO [22], MOCell [23], AbYSS [24], MOEA/D [25], FastPGA [26], IBEA [27], SMPSO [28], while others use customized ones.

In short, there are a number of ADSE tools available that use heuristics and / or meta-heuristics, but none of them use meta-optimization to make the tool more generalized, intelligent, effective and capable for solving NP-hard computational search problems in the design space exploration of computer architectures. According to the “no free lunch” theorem [29] introduced by David Wolpert and William G. Macready in connection with the problems of search and optimization, no single algorithm exists that can solve all problems better than all other algorithms.

In this article we present the modifications to FADSE tool required to implement a meta-optimization layer for the ADSE of the GAP architecture. Instead of using a single meta-heuristic at a time and repeating the same experiment with different meta-heuristics for better possible configurations, a meta-optimization function is introduced that exploits multiple given meta-heuristics concurrently, in order to find Pareto quasi-optimal solutions with better convergence speed and diversity. In this research, we have used two well-known multi-objective optimization meta-heuristics: NSGA-II [18] and SPEA2 [19], but using any number of multi-objective genetic algorithms is possible with little or no change. The configurations found have mutual confidence from all of the meta-heuristics used, with better heuristics having a stronger impact during the dynamic optimization process, within a simulation time that is equal to the time required for the run of just a single DSE algorithm. For doing this, a new layer is introduced in FADSE, which takes care of the meta-optimization, without making substantial changes to its existing architecture. More about this layer will be discussed in subsequent sections.

This article is structured as follows: Section 2 provides an overview of the related work. The tools (FADSE and GAP)

Improving Computing Systems Automatic Multi-Objective Optimization through Meta-

Optimization

A

Lucian Vințan, Radu Chiș, Muhammad Ali Ismail, Cristian Coțofană




2

are presented in Section 3, while Design Space Exploration and Meta-Optimization concepts along with the objectives and parameters used for the DSE algorithms are presented in Section 4. The research methodology and the results are shown in Section 5 and, finally, Section 6 concludes the paper and suggests directions for further work.

II. RELATED WORK

Heuristics and Meta-Heuristics are powerful techniques used for automatic design space exploration of computer architectures. Besides FADSE (focused on in this paper), similar tools which use heuristics and / or meta-heuristics exist. Archexplorer [9] is an online website which allows system architects to upload their design of a system component (e.g. cache simulator) and makes use of several genetic operators like mutation, crossover and selection. Multicube Explorer [10] is a tool quite similar to FADSE that uses many DSE meta-heuristics based algorithms as optimizers like Pareto DoE, APRS, MOSA, MOPSO, NSGA-II, SEMO, FEMO, GEMO and Linear scan. Another such tool is Magellan [11], which focuses on exploring multi-core architectures. Instead of using any standard multi-objective heuristics, it uses customized algorithms based on Steepest Ascent Hill Climbing, Genetic Algorithm and Ant Colony Optimization. Other DSE frameworks, many oriented to specific domains, include NASA [12], Metropolis [13], MILAN [14], EXPO [15], FAMA [16] and SPLOT [17]. In short, there are DSE tools available that use heuristics and / or meta-heuristics, but none of them uses either meta-optimization or hyper-heuristics to be more generalized, intelligent, effective and capable for better solving harder computational search problems in computer architecture design space exploration.

Meta-optimization has been used in Portfolio Selection [38], Direct Search Optimization Methods [39], Large Scale Parameter Optimization [40], Compiler Heuristics [41] etc. As far as we know, we are the first introducing a meta-optimization approach in the multi-objective design space exploration of computer architectures.

III. FADSE OPTIMIZATION TOOL AND GAP MICRO-ARCHITECTURE

In this section we present FADSE and the GAP simulator. Framework for Automatic Design Space Exploration (FADSE) has been initially developed by former Ph. D. student Horia Calborean under the supervision of Prof. Lucian Vintan at the “Lucian Blaga” University of Sibiu and allows users to perform Automatic Design Space Exploration (ADSE) of computer systems using state-of-the-art evolutionary and particle swarm multi-objective algorithms implemented in the jMetal library [31], as presented in our previous work [2][4]. FADSE also contains several quality indicators for comparing different algorithms or different runs of the same algorithm, which do or do not require the true Pareto front. For our research, of interest are the ones that do not require the true Pareto front like Coverage [34], Hyper-

volume [34] and Two Set Hyper-volume Difference [35], because an exhaustive search to know the true Pareto front is impossible due to the huge search space.

FADSE can be connected to any existing computer system simulator just by writing the related connector. It has been coupled and tested with many computer architecture simulators like: GAP and GAPtimize [7], [8] (also used in this article), M-SIM3 [32], M5 [33], Multi2Sim [36] and Sniper [37]. Further details are given in [1].

The Grid ALU Processor (GAP) introduced by Uhrig et al. [8] is a novel superscalar processor architecture designed to speed up the execution of sequential instruction streams. It is a superscalar processor with in-order execution. The main novelty consists in the replacement of execution units by an array of functional units.

The objectives for the GAP optimization problem are, firstly, the average number of clock cycles per instruction (CPI) and, secondly, the hardware complexity metric introduced as a model in [6]. Both objectives have to be minimized and both are conflicting. More precisely, for validation, we used our proposed meta-optimization approach to automatically find GAP’s parameters’ values that minimize both of its objectives: the average number of clock cycles per instruction and the corresponding hardware complexity. It is obvious that these two objectives are conflicting. We computed them by simulating the processing of 10 benchmarks from the well-known MiBench suite on the GAP simulator.

IV. META-OPTIMIZATION AND DESIGN SPACE EXPLORATION

CONCEPTS

Meta-optimization is a technique where an optimization algorithm is applied as a meta-level optimizer to other optimization algorithms. The algorithms which are optimized are called base-level algorithms, while the one that solves the global parameterized optimization problem is called meta-level algorithm. A related concept is that of hyper-heuristics. The main difference between hyper-heuristics and meta-optimization is that the former uses heuristics to drive other low level heuristics, while the latter uses various forms of control (static functions or even heuristics) at any layer of the problem. In fact, hyper-heuristics are part of the more encompassing meta-optimization field. In meta-optimization, there can be no-learning, offline-learning or online-learning. More about meta-optimization and hyper-heuristics can be found in [30]. The two meta-heuristics used in this article to run a multi-objective DSE on computer architectures are NSGA-II and SPEA2. Both algorithms use the concept of Pareto efficiency which is very useful in these cases. Pareto efficiency, or Pareto optimality, is a concept initially used in economics, having nowadays applications in mathematics, engineering and social sciences, etc. It describes those situations where any change to make an objective better is impossible without the worsening of other objectives.

The NSGA-II algorithm was proposed by Deb in [18]. It is a genetic algorithm based on obtaining a new population from the original set by applying the three well-known genetic




3

operators (selection, crossover, and mutation). Selecting the individuals that will be in the next generation is done as follows: the individuals in the two populations (parent population and offspring) are combined and sorted into different successive “Pareto fronts” from where the best solutions are chosen to create the new population. The elitist selection of best individuals starts from the first non-dominated Pareto front. Differentiating individuals belonging to the same Pareto front is done by a density estimation based on measuring the crowding distance to the surrounding individuals. This additional step is done to ensure diversity in the new population.

SPEA2 was proposed by Zitzler in [19]. Each individual has a fitness value based on the number of individuals dominated by the ones that are better than itself, also taking into account a density estimation. This algorithm uses an external archive, which holds the best (non-dominated) individuals up to a certain point, and applies the well-known genetic operators (selection, crossover and mutation) to create the next generation. Then, both the new population and the archive are reunited and from there the non-dominated individuals represent the archive of the next iteration. If the number of non-dominated individuals is greater than the archive size a truncation operator, based on the distance to the k-th nearest neighbor, is used. This way diversity is preserved and the individuals having the minimum distance to any other individual are chosen, just like in NSGAII.

In order to compare different algorithms we need some metrics. Some of these metrics need the Pareto front to be known beforehand. As this is infeasible, because it can be computed only through an exhaustive search, they are not of interest in our work.

Some quality metrics that do not require the Pareto front to be known are: 1) Coverage

Introduced in [34], it can be used to compare two multi-objective DSE algorithms by the fraction of individuals from one algorithm that dominates the individuals from the other algorithm.

2) Hyper-volume This metric has been introduced in [34] and computes the volume enclosed by the current Pareto front and the axes in a maximization problem. For a minimization problem, the metric computes the volume enclosed between the

hyper-volume reference point and the Pareto front approximation. Considering our minimization problem, the hyper-volume reference point coordinates are set to the maximum found values of the objectives.

3) Two set hyper-volume difference This measure was proposed in [35]. Considering X1 and X2 as two sets of phenotype decision vectors, the TSHD is defined by the following formula:

( ) ( ) ( )1, 2 1 2 – 2TSHD X X HV X X HV X= +

Where 1 2X X+ is the union of the two vector sets, 1X and 2X , and HV is the normal hyper-volume measure. This

way ( )1, 2TSHD X X gives the hyper-volume of the portion of

objective space that is dominated by the first set but not by the second. If ( )1, 2 0TSHD X X = and ( )2, 1 0TSDH X X > we

can say that 2X is definitely better than 1X .

V. RESEARCH METHODOLOGY AND RESULTS

Currently, in FADSE, the selection of the used meta-heuristic is done before the DSE process of the architecture under study starts. The experiment has to be repeated with different meta-heuristics to get a better idea of the possible design space and best solutions. It has been observed that some meta-heuristics show better performance for some of the performance objectives while other algorithms for other ones. For example, a meta-heuristic might have a good convergence speed but might not deliver good solutions (i.e. the Pareto front approximation isn't close to the real Pareto front), while other meta-heuristics present the opposite behavior, i.e. finding good solutions (a Pareto front approximation closer to the real Pareto front), but with a longer processing time. Additionally, in order to improve solutions’ quality and the convergence speed, domain-knowledge and constraints have been incorporated into FADSE, but they are also meta-heuristic specific.

Instead of using a single meta-heuristic at a time and repeating the same experiment with different meta-heuristics for finding better possible configurations, a meta-optimization function is introduced that exploits multiple given meta-heuristics concurrently, to obtain Pareto quasi-optimal solutions, with better convergence speed and diversity, in the same amount of time. The configurations found enjoy mutual confidence from all of the meta-heuristics used, with better heuristics having more of an impact. For this, a new layer is introduced in FADSE, which takes care of the meta-optimization, without making much change to its existing architecture.

A. Meta-optimization in FADSE

For this study, we have considered two well-known multi-objective optimization algorithms: NSGA-II (A) and SPEA2 (B). The reasons for selecting these algorithms are: firstly, they are well established multi-objective optimization algorithms and, secondly, they have already been used extensively in FADSE to exploit many of the computer architecture simulators, including GAP, the one selected for the validation of this paper.

Fig. 1. Meta-optimization concept




4

Fig. 2. FADSE Architecture with meta-optimization layer

The introduced meta-optimization layer is responsible for creating a master population by selecting individuals from the off-springs produced by the various base-level meta-heuristics. This master population is then fed as the parent population for every meta-heuristic in order to generate new off-springs. From these new off-springs a new master population is once again created and the process repeats itself. The proportion of individuals submitted by a meta-heuristic (ρ) to the master population is dynamically adjusted every generation according to its performance. The performance score is computed from the results of multiple quality indicators. This represents an on-line learning mechanism and provides a window for better and more diverse solutions and faster convergence, too. In this way, for every generation, after the initial random one, the meta-heuristics performing better will offer more individuals to the master population, in a run-time manner, which drives the simulations towards better solutions. The conceptual addition of the meta-optimization layer in the current architecture of FADSE is depicted in Fig. 2.

For clarity, the following notations are being used in the next paragraphs: N represents the number of individuals in a generation, “t” is the index of the tth iteration, A and B are the indices referring to meta-heuristic A (NSGAII) and B (SPEA2), M is the index that specifies that the population P is a master/parent population, C specifies that population P is a combined population, while the apostrophe (’) specifies that the individuals in the current population have been evaluated and the objectives are known. ρ represents the percentage of individuals that should be generated by each of the algorithms in the current iteration (also named selection variable) and will be updated after each iteration and reflects the quality of the individuals from each meta-heuristic.

The proposed algorithm starts from a single random population , 0MtP at t = of N individuals, the first

parent/master population. The individuals are evaluated so that

each of the corresponding objectives is known. The evaluated population '

MtP is then fed to both meta-heuristics to generate

the respective populations, AtP and BtP , with the number of

new off-springs being *At AtN Nρ= and *Bt BtN Nρ= ,

according to the given proportions, such that A BN N N+ = .

AtP and BtP are then evaluated and each one is separately

combined with the master population to form bigger populations, ' ' '

At Mt AtPC P P= ∪ and respectively ' ' 'Bt Mt BtPC P P= ∪ .

From these populations, using the specific selection mechanisms of every meta-heuristic, the best AtN and BtN

individuals are extracted and combined to form the new

master population ( )'

1M tP + . To calculate a fair proportion for

every meta-heuristic and to track, compare and control their progress, we need a performance equation based on some standard quality indicators. The individuals of the master population are then used to calculate the proportion of each meta-heuristic for the next generation, ( )1A tρ + and ( )1B tρ + ,

using an aggregation of the two set hyper-volume difference and coverage performance metrics.

The process of selection of best individuals from the combined population, when forming a new master population, may lead to duplicate selections of some of the solutions. These individuals belong to the previous master population

'MtP , and have been selected by both meta-heuristics because

they are better than all of their respective off-springs. This phenomenon happens mainly towards the end of the algorithm, when the master population better approximates the true Pareto front; therefore it is highly unlikely that new off-springs offer a Pareto improvement. The problem actually appears because of the two constraints that must be satisfied: firstly, that the N individuals selected should be unique and,




5

secondly, that they should respect the selection variables Atρ

and Btρ . We have solved this problem using the strategy

presented below. The best N individuals selected can be grouped by what

meta-heuristics chose them: ANSel individuals selected only

by meta-heuristic A, BNSel individuals selected only by meta-

heuristic B and ABNSel individuals selected by both. If we

consider ANS the number of all the individuals selected by A

and BNS the number of all the individuals selected by B, then

we have the following identities: A A ABNS NSel NSel= +

B B ABNS NSel NSel= +

In total, we have 2*A B ABNSel NSel NSel N+ + =

individuals, out of which only A B ABNSel NSel NSel+ + are

unique, while the other ABNSel are duplicates. Therefore, we

have to select ABNSel N= ′ more individuals in order to

satisfy the first constraint mentioned above, that we should have finally N unique solutions. We will then determine meta-heuristic A to select '

ANSel more individuals and, analogously,

meta-heuristic B to select 'BNSel additional solutions. In the

end, in total, each algorithm will have selected 'ANS ,

respectively 'BNS , individuals, with:

' 'A A ANS NS NSel= +

' 'B B BNS NS NSel= +

' 'A BNS NS N+ =

The question that remains is how much will 'ANS and '

BNS

be? We begin from our purpose, which is to respect the two ratios, ρA and ρB . We have the following relationships:

'

' 'ρ A

A

A B

NS

NS NS=

+ and

'

' 'ρ B

B

A B

NS

NS NS=

+

'

' '

'

' '

ρ AA

A B

A AB A

A AB A B AB B

NS

NS NS

NSel NSel NSel

NSel NSel NSel NSel NSel NSel

= =+

+ += =

+ + + + +

( )( ) ( )

' '

' '2A AB A A A

A B AB A B

NSel NSel NSel NS NSel

N NNSel NSel NSel NSel NSel

+ + += =

++ + + ′+

( ) 'ρ *A A AN N NS NSel↔ = +′+

'* ρA A A ANS N NS NSel↔ + = +′

' * ρA ANSel N↔ = ′

Analogously, ' * ρB BNSel N= ′ . Through our computations

we have come to an intuitive result: needing N ′ additional individuals, they will be selected by each meta-heuristic according to the same proportions as before. Duplicates can once again appear, thus the solution is to apply these selections iteratively until we finally select N unique individuals.

B. Performance metrics

Convergence and diversity are the two well-known performance metrics used to compare different multi-objective optimization algorithms or different runs of the same algorithm (with the same or different parameters). A number of quality indicators for measuring these two criteria have been proposed, like Generational Distance (GD), Inverse Generational Distance (IGD), Hyper-volume (HV), Epsilon, Spread, Generalized Spread indicators and many others. Some are intended to measure only the convergence or only the diversity, while others take into account both criteria. Also, some of them need the Pareto front to be known beforehand and some do not.

To calculate a fair proportion of contribution in the master population for every meta-heuristic and to track, compare and control their progress, we need a performance equation based on some standard performance metrics. Due to our problem's complexity, we can only use metrics that do not require the true Pareto front. Therefore, we have selected two performance metrics: the Two Set Difference Hyper-volume and Coverage that we have briefly presented in Paragraph 4. Both are widely used in literature and are also available in FADSE.

The selection variable for meta-heuristic A (NSGAII) is calculated according to the following formula:

Fig. 3. Meta-optimization algorithm flow




6

Fig. 4. Coverage Meta-optimization vs NSGAII

The parameter α is a coefficient that modifies the influence

of the two metrics, while and ˆAtC are the

normalized values of the Two Set Hyper-volume Difference and Coverage, respectively. The Two Set Hyper-volume Difference is computed as follows:

The parameter ε is a small value used to avoid division by zero when normalizing. Analogously, the normalized Coverage is calculated using similar formulas, as is the selection variable for meta-heuristic B (SPEA2).

C. Results

The design space exploration process was done with 6 architectural parameters on 10 benchmarks (dijkstra, jpeg-encode, jpeg-decode, stringsearch, qsort, gsm-encode, gsm-decode, encode-nounroll, decode-nounroll, file-decode) selected from the MiBench suite [5]. The input parameters of the GAP architecture are described in Table I.

The design space has in this case over 1.2 million configurations. The DSE process was run with 100 individuals per generation, for 100 generations. The optimization process was done on an 8 core I7 4770K processor running at 4.4 GHz, where 6 FADSE clients and a FADSE server process were started. So for every one of the 100 generations, we simulated 100 individuals, each individual was evaluated on 10 benchmarks. In our results, presented further down, we considered the average values of the outputs obtained for all the simulated benchmarks (for example the CPI fitness of each individual belonging to a generation represents the arithmetical mean of the CPIs obtained on each of the 10 benchmarks). In total, all simulations that we ran took around 2 weeks. During this time each of the runs (NSGAII, SPEA2 and Meta-Optimization) created more than 3000 unique individuals, while together they generated approximately 8700 unique individuals. An exhaustive search is not feasible because of the huge search space of 1.2 million configurations. The whole search space could be evaluated on the same 8 core machine in around 5 years.

The NSGAII, SPEA2 and meta-optimization runs each generated N individuals per generation and ran for approximately T time. The NSGAII+SPEA2 run is a superposition of the NSGAII and SPEA2 runs, where we have combined the generated populations at step “t” into a 2N sized population. So the NSGAII+SPEA2 took 2T time to be simulated and a number of 2N individuals.

The results we gathered show an expected outcome but also initially unpredicted effects, like the synergism of the meta-optimization. We will summarize our findings in the following paragraph.

The coverage metric is a quality metric that can be used to compare two different algorithms or two runs of the same algorithm by the fraction of individuals from one algorithm that are non-dominated by individuals from the other algorithm (run). Fig. 4 and 5 present the coverage over the 100 generations. It can be observed that the meta-optimization run is better than each of the normal NSGAII and SPEA2 runs, because more individuals from the meta-optimization run are non-dominated by individuals from the other ones. A quite interesting observation is the fact that there are still more than 20% of the individuals from the normal runs that are not dominated by individuals from the meta-optimization. This is a result of the search being a non-exhaustive stochastic process in a huge design space, 1.2 million configurations, and a relatively small number of generations, 100. Thus, a base-level meta-heuristic by itself would still offer valuable insight to the design space, but to a lesser extent. TABLE I

GAP PARAMETERS

Parameter Description Domain

GAP’s array

p0 Rows 4,5,6,…,32 p1 Columns 4,5,6,…,31 p2 Layers 1,2,4,…,64

Instr. Cache

p3 Line size 4,8,16

p4 Sets 32,64,128,…,8192 p5 Lines per set 1,2,4,…,128

Fig. 5. Coverage Meta-optimization vs SPEA2




7

Fig. 6 shows the coverage between the meta-optimization and the superposition of the NSGAII and SPEA2 runs (NSGAII+SPEA2). Superposition means to develop the Pareto front obtained from super positioning the Pareto fronts obtained by the two DSE algorithms. This involves separately running the two basic-level algorithms. Interesting here is that the NSGAII+SPEA2 run has more non-dominated individuals than the meta-optimization run. This can be explained by the fact that the combination of NSGAII and SPEA2 generated roughly two times more individuals than the meta-optimization, each generation having 2N individuals instead of just N.

The hyper-volume computes the area enclosed by the current Pareto front approximation and the hyper-volume reference point in our 2-D minimization problem. Looking at this

indicator as a convergence metric, we can state that all algorithms have a very fast convergence speed. NSGAII, SPEA2 and also the combination of NSGAII and SPEA2 have found almost the best found solutions at around 40 generations and their quality is not going to significantly increase in the seen future. Looking at Fig. 7, it can be seen that the meta-optimization run continues to increase also after the 50th generation. The hyper-volume can also be regarded as a quality metric; therefore, we can say that the meta-

optimization run gives better results than both the normal runs of NSGAII and SPEA2, but also than the combination of populations of NSGAII and SPEA2. This result might be counter-intuitive, because one would think that a run completed with N individuals in a period of T time, couldn’t yield better results than a run with 2N individuals in 2T time. The meta-optimization approach shows here the synergy of the combined runs that is remarkable.

Both metrics above have shown us that our new meta-optimization approach is better than NSGAII, SPEA2 and their combination (super-position). The real question would be by how much is it better? In previous articles we have shown that NSGAII is better than SPEA2 both in hyper-volume and coverage, but the real differences in solutions’ quality are very small, about 1-2%. Looking at the Two Set Hyper-volume Difference in Fig. 8 we see that more than 99% of the hyper-volume is common to both the meta-optimization run and the superposition of NSGAII and SPEA2. Thus, both runs differ by at most 1%: the meta-optimization run brings a novelty of more than 0.2% to the explored design space, while NSGAII+SPEA2 less than 0.1%. Looking at Fig. 8 it seems that the hyper-volume dominated only by the NSGAII + SPEA2 run is getting smaller and smaller – the solution quality stagnates, while the one dominated by the meta-optimization run is getting bigger - the latter is finding better solutions even in the last generations.

Looking at the Pareto front approximation found by each of the meta-heuristics, NSGAII and SPEA2, we can see that NSGAII better explores the hardware complexity objective,

while SPEA2 offers the best solution in terms of performance (clocks per instruction). Each of them finds the best results for one of the objectives as shown in Table II. Every entry in this table contains a pair of numbers representing the output objectives (CPI, Hardware Complexity). Each line contains the best individuals from two distinct points of view: the left one represents the individual having the best CPI belonging to the Pareto front and the right one represents the individual having the best Hardware Complexity belonging to the Pareto front. Fig. 9 shows the Pareto front approximations for NSGAII and SPEA2.

Fig. 6. Coverage Meta-optimization vs NSGAII+SPEA2

Fig. 7. Hyper-volume comparison

Fig. 8. Two set hyper-volume difference




8

Fig. 10 shows a restricted view of the Pareto front approximation of the superposition NSGAII+SPEA2 and the meta-optimization run. It can be clearly seen that there are some individuals that are dominated by the combined run and there are zones where the meta-optimization dominates, in these the individuals being sparser. This is how the coverage gives false impressions, because it does not take into consideration the distance/area between the Pareto front approximations, but only the number of individuals that are dominated (cardinality metric). This is where TSHD metric really excels and shows that the meta-optimization run dominates a bigger space, although with less individuals. The exploration of the hardware complexity contributes to the better results found by the new algorithm.

In order to have off-springs generated by both algorithms in every generation, we have introduced a minimum threshold

for the number of individuals selected from each low-level meta-heuristic. In our simulations we set this value to 10%, which means that even if the Selection Variable has been computed to a value smaller than 10%, in the next generation there will be 10% of N individuals created by that specific algorithm. The percentage of individuals per generation can be seen in Fig. 11. It can be observed that a greater percentage of the off-springs is being submitted by the meta-optimization NSGAII (MONSGAII), rather than by the meta-optimization SPEA2 (MOSPEA2). This behavior is expected because NSGAII has better results on the coverage and hyper-volume metrics, which dictate the computation of the Selection

Variables and, thus, the computation of the number of generated individuals by each algorithm.

VI. CONCLUSIONS AND FURTHER WORK

All the simulations have been started from the same random initial population. It has been observed that the new meta-optimization method yields better results than the combination of two runs with different meta-heuristics, all of this in half the time and with half of the number of total simulations. The differences between the results in respect to the TSHD are smaller than 1%, but the meta-optimization has 3 times more non-dominated space than the combination NSGAII and SPEA2 (0.2% vs 0.1%). The coverage metric can be misleading, giving false impressions showing that some results are much better than others, 10% difference is quite high in optimization problems, or that the meta-optimization approach is worse than the combination run. This happened because the number of individuals simulated in the superposition approach was twice as big as in the meta-optimization approach.

Using two meta-heuristics the meta-optimization approach dramatically decreased the number of individuals from 2N to N and the simulation time from 2T to T (from 10 days to 5 days of simulation), while providing better results (a configuration has for almost the same CPI value of 1.00, the hardware complexity 38% smaller/better – 35.81 vs 58.61 as it is shown in Fig. 10). The obtained synergism in the proposed meta-optimization algorithm was not expected by our initial hypotheses. Depending on the problem, having a larger number of individual per generation or increasing the number of populations, should improve the solutions’ quality of the meta-optimization runs to the point where there should be less

TABLE II BEST FOUND CONFIGURATIONS

Best Clocks per instruction Best Hardware complexity

NSGAII 0.523734116, 1137.273314 2.296665421, 56.21046635

SPEA2 0.523169018, 2346.387481 0.740772914, 71.75130072

Meta-Optimization

0.523863348, 1484.907481 1.003468038, 35.81644

Fig. 9. Pareto front approximation NSGAII and SPEA2

Fig. 10. Pareto front approximation NSGAII+SPEA2 and Meta-optimization

Fig. 11. Number of generated off-springs by the meta-optimized algorithms




9

and less individuals from the combination of the meta-heuristics that do not appear in the algorithm presented in this paper.

Due to the implementation’s nature of the meta-optimization layer, the meta-level algorithm can be changed, taking into consideration other metrics or algorithms, while the number of meta-heuristics is not limited. Any number of algorithms could be used to run design space explorations, offering thus a scalable approach. The implementation of the meta-level algorithm can be also easily changed to incorporate another logic.

Our meta-optimization approach has improved FADSE tool and made it more generalized: Every problem has an algorithm that is best fitted for it from a certain point of view. Having more algorithms incorporated in FADSE through the Meta-Optimization layer, we can obtain better synergistic results. For example, in our validation case (GAP architecture), NSGAII finds better and more diverse results for the Hardware Complexity objective, while SPEA2 finds better and more diverse results for the CPI objective (see Fig. 9). Putting these two meta-heuristic to work together, we demonstrated a certain synergism, proving that our approach is more effective. By setting a threshold for the minimum dynamic percentage of individuals created by each algorithm, some diversity is also preserved, which has proved to improve solutions’ quality.

Another advantage consists in the fact that our approach is more versatile, with a significant impact on solving NP-hard computational search problems. This means that the end user does not have to manually search for the best meta-heuristic for a given optimization problem; our approach automatically detects at run-time the “best” instantaneous algorithm based on methods belonging to the machine learning research field.

As a potential shortcoming, we mention that there are some minor technical changes needed to make a certain multi-objective algorithm “compatible” with our meta-optimization layer interface.

As further work we intend to integrate more meta-heuristics in the meta-optimization process. This further development is really straightforward for new genetic algorithms. Even at this moment our approach allows the usage of a variable number of such genetic algorithms. An open problem for us for the time being is how to integrate some Particle Swarm Optimization class algorithms. This might be more difficult because these algorithms have some specific additional constraints, such as the parameters which control the effect of the personal and global best particles C1 and C2. Other types of algorithms that might be useful to integrate are the Imperialist Competitive Algorithms. Some other ideas consist in improving the meta-level algorithm’s learning method, but also investigating the optimal number of individuals per generation would be of a certain interest, too.

REFERENCES [1] H. A. Calborean, “Multi-Objective Optimization of Advanced Computer

Architectures using Domain- Knowledge,” PhD Thesis, “Lucian Blaga” University of Sibiu, Sibiu, 2011 (Supervisor Prof. L. Vintan).

[2] Chis R., Vintan M., Vintan L., “Multi-objective DSE Algorithms’ Evaluations on Processor Optimization, Proceedings of 9th International Conference on Intelligent Computer Communication and Processing”, pp. 27-34, Cluj-Napoca, September 2013

[3] Chis R., Vintan L., “Multi-Objective Hardware-Software Co-Optimization for the SNIPER Multi-Core Simulator”, Proceedings of 10th International Conference on Intelligent Computer Communication and Processing, pp. 3-9, Cluj-Napoca, September 2014

[4] R. Jahr, H. Calborean, L. Vintan, and T. Ungerer, “Finding near-perfect parameters for hardware and code optimizations with automatic multi-objective design space explorations,” Concurr. Comput. Pract. Exp., doi: 10.1002/cpe.2975, 2012

[5] S. Uhrig., B. Shehan., R. Jahr, T. Ungerer, “The Two-dimensional Superscalar GAP Processor Architecture”. International Journal on Advances in Systems and Measurements. 3, 71 – 81 (2010)

[6] R. Jahr, T. Ungerer, H. Calborean, and L. Vintan, “Automatic multi-objective optimization of parameters for hardware and code optimizations,” in High Performance Computing and Simulation (HPCS), 2011 International Conference on, 2011, pp. 308–316.

[7] B. Shehan, R. Jahr, S. Uhrig, and T. Ungerer, “Reconfigurable Grid Alu Processor: Optimization and Design Space Exploration,” in Digital System Design: Architectures, Methods and Tools (DSD), 2010 13th Euromicro Conference on, 2010, pp. 71–79.

[8] S. Uhrig, B. Shehan, R. Jahr, and T. Ungerer, “A Two-Dimensional Superscalar Processor Architecture,” in Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns, 2009. COMPUTATIONWORLD ’09. Computation World:, 2009, pp. 608–611.

[9] V. Desmet, S. Girbal, A. Ramirez, A. Vega, and O. Temam, “ArchExplorer for Automatic Design Space Exploration,” IEEE Micro, vol. 30, no. 5, pp. 5–15, 2010.

[10] V. Zaccaria, G. Palermo, F. Castro, C. Silvano, and G. Mariani, “Multicube Explorer: An Open Source Framework for Design Space Exploration of Chip Multi-Processors,” in Architecture of Computing Systems (ARCS), 2010 23rd International Conference on, 2010, pp. 1–7.

[11] S. Kang and R. Kumar, “Magellan: A Search and Machine Learning-based Framework for Fast Multi-core Design Space Exploration and Optimization,” in Design, Automation and Test in Europe, 2008. DATE ’08, 2008, pp. 1432–1437.

[12] Z. J. Jia, A. D. Pimentel, M. Thompson, T. Bautista, and A. Nunez, “NASA: A generic infrastructure for system-level MP-SoC design space exploration,” in Embedded Systems for Real-Time Multimedia (ESTIMedia), 2010 8th IEEE Workshop on, 2010, pp. 41–50.

[13] F. Balarin, Y. Watanabe, H. Hsieh, L. Lavagno, C. Passerone, and A. Sangiovanni-Vincentelli, “Metropolis: an integrated electronic system design environment,” Computer, vol. 36, no. 4, pp. 45–52, Apr. 2003.

[14] A. Bakshi, V. K. Prasanna, and A. Ledeczi, “MILAN: A Model Based Integrated Simulation Framework for Design of Embedded Systems,” in Proceedings of the 2001 ACM SIGPLAN Workshop on Optimization of Middleware and Distributed Systems, New York, NY, USA, 2001, pp. 82–93.

[15] L. Thiele, S. Chakraborty, M. Gries, and S. Künzli, “A Framework for Evaluating Design Tradeoffs in Packet Processing Architectures,” in Proceedings of the 39th Annual Design Automation Conference, New York, NY, USA, 2002, pp. 880–885.

[16] D. Benavides, S. Segura, P. Trinidad, and A. Ruiz-cortés, “FAMA: Tooling a framework for the automated analysis of feature models,” in In Proceeding of the First International Workshop on Variability Modelling of Softwareintensive Systems (VAMOS, 2007, pp. 129–134.

[17] M. Mendonca, M. Branco, and D. Cowan, “S.P.L.O.T.: Software Product Lines Online Tools,” in Proceedings of the 24th ACM SIGPLAN Conference Companion on Object Oriented Programming Systems Languages and Applications, New York, NY, USA, 2009, pp. 761–762.

[18] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” Evol. Comput. IEEE Trans. On, vol. 6, no. 2, pp. 182–197, Apr. 2002.

[19] E. Zitzler, M. Laumanns, and L. Thiele, „SPEA 2: Improving the Strength Pareto Evolutionary algorithm”, Technical report 103, Computer engineering and Networks Laboratory (TIK), Swiss Federal Institute of Technology (ETH) Zurich, 2001.

[20] J. D. Knowles and D. W. Corne, “Approximating the nondominated front using the Pareto Archived Evolution Strategy” Evol. Comput., vol. 8, no. 2.




10

[21] D. W. Corne, N. R. Jerram, J. D. Knowles, M. J. Oates, and M. J, “PESA-II: Region-based Selection in Evolutionary Multiobjective Optimization,” in Proceedings of the Genetic and Evolutionary Computation Conference GECCO’2001, 2001, pp. 283–290.

[22] M. Sierra and C. Coello Coello, “Improving PSO-Based Multi-objective Optimization Using Crowding, Mutation and Dominance,” in Evolutionary Multi-Criterion Optimization, Eds. Springer Berlin Heidelberg, vol. 3410, 2005, pp. 505–519.

[23] A. J. Nebro, J. J. Durillo, F. Luna, B. Dorronsoro, and E. Alba, “MOCell: A cellular genetic algorithm for multiobjective optimization,” Int. J. Intell. Syst., vol. 24, no. 7, pp. 726–746, 2009.

[24] A. J. Nebro, F. Luna, E. Alba, B. Dorronsoro, J. J. Durillo, and A. Beham, “AbYSS: Adapting Scatter Search to Multiobjective Optimization,” IEEE Trans. on Evol. Comput., vol. 12, no. 4, pp. 439–457, Aug. 2008.

[25] Q. Zhang and H. Li, “MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition,” IEEE Trans. on Evol. Comput., vol. 11, no. 6, pp. 712–731, Dec. 2007.

[26] H. Eskandari, C. Geiger, and G. Lamont, “FastPGA: A Dynamic Population Sizing Approach for Solving Expensive Multiobjective Optimization Problems,” in Evolutionary Multi-Criterion Optimization, vol. 4403, Eds. Springer Berlin Heidelberg, 2007, pp. 141–155.

[27] E. Zitzler and S. Künzli, “Indicator-Based Selection in Multiobjective Search,” in Parallel Problem Solving from Nature - PPSN VIII, vol. 3242, X. Yao, E. Burke, J. Lozano, J. Smith, J. Merelo-Guervós, J. Bullinaria, J. Rowe, P. Tino, A. Kabán, and H.-P. Schwefel, Eds. Springer Berlin Heidelberg, 2004, pp. 832–842.

[28] A. J. Nebro, J. J. Durillo, J. García-Nieto, C. A. C. Coello, F. Luna, and E. Alba, “SMPSO: A New PSO-based Metaheuristic for Multi-objective Optimization,” in 2009 IEEE Symposium on Computational Intelligence in Multicriteria Decision-Making (MCDM 2009), 2009, pp. 66–73.

[29] D. H. Wolpert, W. G. Macready, “No free lunch theorems for optimization”, in IEEE Trans. on Evol. Comp., vol. 1, no. 1, pp. 67-82, 1997.

[30] E. K. Burke, M. Gendreau, M. Hyde, G. Kendall, G. Ochoa, E. Özcan, and R. Qu, “Hyper-heuristics: a survey of the state of the art,” J. Oper. Res. Soc., vol. 64, no. 12, pp. 1695–1724, Dec. 2013.

[31] J. J. Durillo and A. J. Nebro, “jMetal: A Java framework for multi-objective optimization,” Adv. Eng. Softw., vol. 42, pp. 760–771, 2011.

[32] “M-Sim: The Multithreaded Simulator.” [Online]. Available: http://www.cs.binghamton.edu/~msim/. [Accessed: 23-Apr-2014].

[33] “gem5.” [Online]. Available: http://www.gem5.org/Main_Page. [Accessed: 23-Apr-2014].

[34] E. Zitzler and L. Thiele, “Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach,” Evol. Comput. IEEE Trans. On, vol. 3, no. 4, pp. 257–271, Nov. 1999.

[35] E. Zitzler, “Evolutionary algorithms for multiobjective optimization: Methods and applications”, vol. 63. Shaker Ithaca, 1999

[36] R. Ubal, J. Sahuquillo, S. Petit, and P. Lopez, “Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors,” in Computer Architecture and High Performance Computing, 2007. SBAC-PAD 2007. 19th International Symposium on, 2007, pp. 62–68.

[37] T. E. Carlson, W. Heirman, and L. Eeckhout, “Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulations,” in International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Nov. 2011

[38] P. Das, & A. Banerjee, “Meta optimization and its application to portfolio selection” in Proceedings of international conference on knowledge discovery and data mining, 2011.

[39] P. Krus, J. Ölvander, "Performance index and meta-optimization of a direct search optimization method", in Engineering Optimization, Vol. 5, No. 6, pp. 345-357, 2012.

[40] C. Neumüller, A. Scheibenpflug, S. Wagner, A. Beham, M. Affenzeller, Large Scale Parameter Meta-Optimization of Metaheuristic Optimization Algorithms with HeuristicLab Hive in Conference: Actas del octavo Congreso Español sobre Metaheurística, Algorítmos Evolutivos y Bioinspirados (MAEB), 2012

[41] M. Stephenson, S. Amarasinghe, M. Martin and U.-M. O’Reilly, “Meta Optimization: Improving Compiler Heuristics with Machine Learning”, in Proceedings of the ACM SIGPLAN 2003 conference on Programming Language Design and Implementation (PLDI ’03), pages 77–90, San Diego, California, USA. ACM, 2003

Lucian N. VINTAN is currently Professor of Computer Engineering at “Lucian Blaga” University of Sibiu, Romania. He led the Advanced Computer Architecture and Processing Systems Research Centre from this university. He received the MSc and PhD degrees in Computer Engineering from "Politehnica" University of Timisoara,

Romania. Professor Lucian VINTAN is an expert in the areas of instruction/thread level parallelism, multi-core and many-core systems, automatic design space exploration, prediction techniques in ubiquitous computing systems and text mining. He published over 100 scientific papers in some prestigious journals and international conferences. His publications acquired over 550 citations through over 350 works published in international conferences and scientific journals. He received The Romanian Academy “Tudor Tanasescu” Prize in 2005. In 2012 he was elected full-member of The Academy of Technical Sciences of Romania. As a European Commission DG Information Society Expert he is actively involved in EU-funded projects from the evaluation of proposals to the managing and reviewing of projects. From October 2012 he was accepted as a HiPEAC member. He has served on the technical program committee of over 110 international computer systems conferences and has peer-reviewed hundreds of research papers for numerous international journals and conferences. Radu CHIS is a PhD student under the supervision of

Professor Lucian N. VINTAN at the “Technical University” of Cluj-Napoca, Romania. The PhD thesis title is “Multi-objective optimization of complex computing systems using heuristic methods and domain micro-ontologies”. He received the MSc degree in Computer Science from the “Lucian Blaga” Faculty

of Engineering in Sibiu, Romania and also followed the economic studies at the “Lucian Blaga” Faculty of Economic Sciences in Sibiu, Romania. His main research fields are Advanced Computer Architecture, Design Space Exploration and Physics - Multilayer Thin Films. He published 9 scientific papers in various journals and international conferences (ICCP, ICSTCC, HiPEAC Summer School, Modern Physics Letters B, Acta Physica Polonica A). He also participated in the AGENT-DYSL FP6 EU Project in the “Lucian Blaga” University team for three years during his Bachelor’s studies.

Muhammad Ali ISMAIL is currently Associate Professor and Director of High Performance Computing Center at Department of Computer and Information Systems Engineering, NED University of Engineering and Technology, Pakistan. He received his MEng and PhD degrees in Computer Systems Engineering from

the same university. He pursued his post doctorate under Erasmus Mundus fellowship program at the Advanced




11

Computer Architecture and Processing Systems Research Centre of “Lucian Blaga” University of Sibiu, Romania, led by Prof. Lucian VINTAN. His areas of interest include Serial, Parallel and Multi-core Processor Architecture and Programming, Cluster and Cloud Architecture and Programming, Performance Analysis of Serial and Parallel Systems, Automatic Design Space Exploration, Multi Objective Optimization and Parallel Heuristics. He has published over 25 scientific papers in international journals and conferences along with a USA patent.

Cristian COTOFANA is currently enrolled for a Master’s degree in “Advanced Computing Systems” at “Lucian Blaga” University of Sibiu, Romania. He received his Bachelor’s degree in Computer Science, with the thesis entitled “Multi-objective optimization of mono-core and multi-core processor architectures” (Supervisor: Prof.

Lucian VINTAN), from the same university.

Improving Computing Systems Automatic Multi-Objective...

Documents

Transcript of Improving Computing Systems Automatic Multi-Objective...