[IEEE 2012 13th International Conference on Parallel and Distributed Computing Applications and...

5
Layered Models for General Parallel Computation Based on Heterogeneous System Yanxiu Sheng, Lin Gui, Zhiqiang Wei, Jibing Duan, Yingying Liu College of Information Science and Engineering Ocean University of China Qingdao Shandong, China [email protected], [email protected] AbstractThe conventional unified parallel computation model becomes more and more complicated which has weak pertinence and little guidance for each parallel computing phase. Therefore, a general layered and heterogeneous idea for parallel computation model research was proposed in this paper. The general layered heterogeneous parallel computation model was composed of parallel algorithm design model, parallel programming model, parallel execution model, and each model correspond to the three computing phases respectively. The properties of each model were described and research spots were also given. In parallel algorithm design model, an advanced language was designed for algorithm designers, and the corresponding interpretation system which based on text scanning was proposed to map the advanced language to machine language that runs on the heterogeneous software and hardware architectures. The parallel method library and parameter library were also provided to achieve the comprehensive utilization of the different computing resources and assign parallel tasks reasonably. Theoretical analysis results show that the general layered heterogeneous parallel computation model is clear and single goaled for each parallel computing phase. Keywords- layered models; heterogeneous system; parallel algorithm design model; interpretation system; parallel programming model; parallel execution model I. INTRODUCTION Parallel computation exists for many years, but it was employed mainly in high-performance computation field. While due to the reduction on hardware costs, interest in parallelism has increased. As the frequency scaling is restricted by physical constraints, and the power consumption of computers has also become a concern in recent years, multi-core processors has predominated in computer architecture. Researchers also shifted their focus from clusters to multi-core CPUs and GPUs. As is known, parallelized programs are usually realized in three steps: 1.the algorithm designer describes the given problem as a numeric or nonnumeric problem; 2. the program designer writes the parallel program with a certain parallel programming language; 3. the program runner compiles and runs the program on a specific parallel platform to complete the computation and solves the problem. The performance of the final parallel program is affected by many factors: designing of algorithm, program implementation, processor architecture, communication between threads, operating system, network bandwidth etc. To build the optimized parallel program, all influence factors should be parameterized and taken into consideration. Since there are too many parameters representing the different characteristics in the parallel platform, the description of the single model would be more and more complicated. Eventually the cost function would be too complicated to solve. By contrast, a layered model not only fits the heterogeneous computation platform more precisely, but also simplifies the problem. Work on parallel computation model has been done since 1970s. The first generation models focused on computation, including PRAM(Parallel Random Access Machine) model based on SIMD and APRAM(Asynchronous Parallel Random Access Machine) based on MIMD. The second generation models focused on network communication. CPUs occupied its own memory, and communicated via network. The third generation model is based on the layered memory and computing cores in modern computers. According to Chen, the model would be divided into parallel algorithm design model, parallel programming model and parallel execution model [1]. The goal and task for each layer is much clearer and simpler, which makes the model easier to implement. This paper raises a group of layered model, which divides the procedure of parallel programming into three stage models. Section 2 introduces the parallel programming and the architecture of the layered parallel computation model on heterogeneous system. In Section 3 the properties of each stage model are described. The main research spots, three tools and one system are also given in section 3. In Section 4, conclusion is made. II. ARCHITECTURE OF THE PARALLEL COMPUTATION MODEL Parallel computers can be roughly classified according to the level at which the hardware supports parallelism, with multi-core and multi-processor computers having multiple processing elements within a single machine, while clusters, massively parallel processing (MPPs) and grids use multiple computers to work on the same task [2]. Specialized parallel computer architectures are sometimes used alongside traditional processors, for accelerating specific tasks. Parallel computer programs are more difficult to write than sequential ones, because concurrency introduces several new classes of potential software bugs, of which race conditions are the most common. Communication and synchronization between the different subtasks are typically one of the greatest obstacles to getting good parallel program performance [3]. Parallel computing is usually accomplished by breaking the problem into independent parts so that each processing 2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies 978-0-7695-4879-1/12 $26.00 © 2012 IEEE DOI 10.1109/PDCAT.2012.85 45

Transcript of [IEEE 2012 13th International Conference on Parallel and Distributed Computing Applications and...

Page 1: [IEEE 2012 13th International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT) - Beijing, China (2012.12.14-2012.12.16)] 2012 13th International

Layered Models for General Parallel Computation Based on Heterogeneous System

Yanxiu Sheng, Lin Gui, Zhiqiang Wei, Jibing Duan, Yingying Liu College of Information Science and Engineering

Ocean University of China Qingdao Shandong, China

[email protected], [email protected]

Abstract—The conventional unified parallel computation model becomes more and more complicated which has weak pertinence and little guidance for each parallel computing phase. Therefore, a general layered and heterogeneous idea for parallel computation model research was proposed in this paper. The general layered heterogeneous parallel computation model was composed of parallel algorithm design model, parallel programming model, parallel execution model, and each model correspond to the three computing phases respectively. The properties of each model were described and research spots were also given. In parallel algorithm design model, an advanced language was designed for algorithm designers, and the corresponding interpretation system which based on text scanning was proposed to map the advanced language to machine language that runs on the heterogeneous software and hardware architectures. The parallel method library and parameter library were also provided to achieve the comprehensive utilization of the different computing resources and assign parallel tasks reasonably. Theoretical analysis results show that the general layered heterogeneous parallel computation model is clear and single goaled for each parallel computing phase.

Keywords- layered models; heterogeneous system; parallel algorithm design model; interpretation system; parallel programming model; parallel execution model

I. INTRODUCTION

Parallel computation exists for many years, but it was employed mainly in high-performance computation field. While due to the reduction on hardware costs, interest in parallelism has increased. As the frequency scaling is restricted by physical constraints, and the power consumption of computers has also become a concern in recent years, multi-core processors has predominated in computer architecture. Researchers also shifted their focus from clusters to multi-core CPUs and GPUs.

As is known, parallelized programs are usually realized in three steps: 1.the algorithm designer describes the given problem as a numeric or nonnumeric problem; 2. the program designer writes the parallel program with a certain parallel programming language; 3. the program runner compiles and runs the program on a specific parallel platform to complete the computation and solves the problem. The performance of the final parallel program is affected by many factors: designing of algorithm, program implementation, processor architecture, communication between threads, operating system, network bandwidth etc. To build the optimized parallel program, all influence factors should be parameterized and taken into consideration. Since

there are too many parameters representing the different characteristics in the parallel platform, the description of the single model would be more and more complicated. Eventually the cost function would be too complicated to solve. By contrast, a layered model not only fits the heterogeneous computation platform more precisely, but also simplifies the problem.

Work on parallel computation model has been done since 1970s. The first generation models focused on computation, including PRAM(Parallel Random Access Machine) model based on SIMD and APRAM(Asynchronous Parallel Random Access Machine) based on MIMD. The second generation models focused on network communication. CPUs occupied its own memory, and communicated via network. The third generation model is based on the layered memory and computing cores in modern computers. According to Chen, the model would be divided into parallel algorithm design model, parallel programming model and parallel execution model [1]. The goal and task for each layer is much clearer and simpler, which makes the model easier to implement.

This paper raises a group of layered model, which divides the procedure of parallel programming into three stage models. Section 2 introduces the parallel programming and the architecture of the layered parallel computation model on heterogeneous system. In Section 3 the properties of each stage model are described. The main research spots, three tools and one system are also given in section 3. In Section 4, conclusion is made.

II. ARCHITECTURE OF THE PARALLEL COMPUTATION MODEL

Parallel computers can be roughly classified according to the level at which the hardware supports parallelism, with multi-core and multi-processor computers having multiple processing elements within a single machine, while clusters, massively parallel processing (MPPs) and grids use multiple computers to work on the same task [2]. Specialized parallel computer architectures are sometimes used alongside traditional processors, for accelerating specific tasks.

Parallel computer programs are more difficult to write than sequential ones, because concurrency introduces several new classes of potential software bugs, of which race conditions are the most common. Communication and synchronization between the different subtasks are typically one of the greatest obstacles to getting good parallel program performance [3].

Parallel computing is usually accomplished by breaking the problem into independent parts so that each processing

2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies

978-0-7695-4879-1/12 $26.00 © 2012 IEEE

DOI 10.1109/PDCAT.2012.85

45

Page 2: [IEEE 2012 13th International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT) - Beijing, China (2012.12.14-2012.12.16)] 2012 13th International

element can execute its part of the algorithm simultaneously with the others. The processing elements can be heterogeneous and include resources such as a single computer with multiple processors, several networked computers, specialized hardware, or any combination of the above.

parallel machine 1 parallel machine k parallel machine n

parallel algorithm design model

parallel programming model

parallel execution model

The phase of algorithm design(oriented to algorithm designers)

The phase of algorithm implementation(oriented to program designers)

The phase of algorithm execution(oriented to program runner)

Use the pseudo code to describe the algorithm and build the algorithm design

model

Call the hardware interfaces and software APIs to write program for the algorithm

Compile and run the program on a specific parallel machine

parallel machine 1 parallel machine k parallel machine n

Lddooaccooddye ttooeddeesscerrriib 1tthhee

Lwwaarepprrooae iinntt

ooggrraayeerrffaacmm ffoecceess aa

orr tthhrnnddee a 2offttww

oorriitthh

Lrruunatthheealllleellypprroog

mmaaceggrraammchhiinnermm oon 3a ssppee

Figure 1. Archtecture of general layered heterogeneous model of parallel computation

Fig. 1 shows the architecture of the general layered heterogeneous model of parallel computation.

In the general layered heterogeneous model of parallel computation, the whole model of a parallel computation would be divided into three child models: parallel algorithm design model, parallel programming model and parallel execution model.

In the parallel algorithm design model, algorithm designers focus on the abstract parameters of the parallel machine. In this layer, algorithm designers do not need to concern about the concrete realization of algorithms, which is different from the traditional parallel development.

In the parallel programming model, based on the different interfaces of hardware and application programming interface (API) of software, program designers design the corresponding parameter library and method library for parallel program.

In the parallel execution model, through the appropriate compiler, parallel program would be compiled into machine language that runs on the corresponding hardware and software architecture.

The parallel computing model processes as shown in Fig. 2. In the layer of algorithm design, developers first design the parallel algorithm to describe the target problem; after being processed through the interpretation system, the parallel algorithm is interpreted into parallel program, which is based on different hardware and software architectures; furthermore, with reference to the current system software and hardware parameters, cost function and calculating behavior, system will optimize the parallel functions and improve the efficiency; after being processed though the build system the parallel parts would be compiled into the corresponding machine language that runs on different hardware. At last, multiple hardware architectures (e.g. cluster, multi-core CPU, multi-core GPU) could work together to complete the computing tasks.

Figure 2. Process of the general layered heterogeneous model of paralle computation

III. DETAILS ON THREE STAGE MODEL

A. Parallel Algorithm Design Model The parallel algorithm model provides an advanced

language, parameter library and method library, which are optimized according to hardware. The structure of the model is as Fig. 3.

Figure 3. Archtecture of parallel algorithm design model

1) Advanced language of parallel algorithm design model

The advanced language designed in the parallel algorithm design model is a simple language that describes the target problem and divides the parallel tasks to design the parallel algorithm. Based on general-purpose parallel machines, the advanced language abstract the parallel instructions and

46

Page 3: [IEEE 2012 13th International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT) - Beijing, China (2012.12.14-2012.12.16)] 2012 13th International

describe the target problem [4]. So that in the layer of parallel algorithm design, algorithm designers analyze the target problem, design the parallel algorithm, identify the parallel tasks from the target problem and describe the target problem by using the advanced language.

Based on different hardware resources, algorithm designers could specify the location of the operational behavior is on the cluster, on the multi-core CPU or on the multi-core GPU by using keyword identifiers. For the division of serial and parallel tasks which run on different hardware resources, algorithm designers could specify the hierarchy of the advanced language by using command keyword.

Method library which is defined in the parallel programming model and designed by program designers is a set of functions which run on the general layered heterogeneous parallel computing model. With reference to the different software and hardware architecture, one parallel task corresponds to a set of parallel functions in method library.

Parameter library’s structure is similar to the method library which is a set of abstract parameters of the current system software and hardware parameters, cost function, calculating behavior and so on.

For instance, suppose there is an advanced language section

GPU_do{ Master{ preDataLocate(fn); } Workers{ MatrixMul(fn); } }End

The keyword GPU_do specifys the whole block should run on GPU and designate the master process (i.e. Master) need to send data in file fn to all the slave processes (i.e. Workers), and slave processes Workers perform the parallel task MatrixMul(fn), which corresponds to a set of parallel functions based on different software and hardware architecture which include function MPI_MatrixMul( ) in the MPI database of method library, function PVM_MatrixMul( ) in the PVM database of method library, and other functions defined in the method library.

If algorithm designers want to introduce new hardware orsoftware architecture to the general layered heterogeneous parallel computing model, algorithm designers just need to add the corresponding parallel program to the method library and configure the parallel machine parameters to the parameter library without modifying or rewriting the original modules to achieve the extension of the model’s structure [5].As there is no need to concern about the concrete realization of parallel programs, algorithm designers could focus on the design of parallel algorithms that greatly simplifies the difficulty of algorithm design and reduces the amount of code.

2) Interpretation system for the designed advanced language

The interpretation system consists of three functional modules, including matching the commands, resources controls and functions. The parallel algorithm would be interpreted into the corresponding parallel program, and thismodel would optimize the calculation parameters of parallel method automatically.

Process of the interpretation system [6] is as follows: � Step 1Scan the whole advanced language the algorithm

designers wrote to get the hierarchy of the command control and resource allocation, and expect to get a linked list (Node_0←Node_1←…Node_i…←Node_n), in order to match more conveniently after, the direction of the linked list is from child node points to its parent node. The node’s data type is defined as:

CommLayerNode{ CommLayerNode* previous; CString type; int start; int end;

} � Step 2: Scan the whole advanced language the second time to get

the start and end row number of each code block. With reference to the results of step1, get the corresponding command and resource control type and expect to get an array block[num_block]. The array’s data type is defined as:

DocBlockNode{ CString ctrltype; CString restype; int start; int end; CString titleStartDef; CString titleEnDef;

} � Step 3: Scan the array block[num_block] and read the method

library to get the corresponding titles for each code block and complete the assignment for array block[num_block]

� Step 4: With reference to the array block[num_block], scan the

advanced language the third time and record the current row number simultaneously, if encounter any start(or end) row numbers recorded in the array block, the output file would print the corresponding start (or end) title for this code block, then scan the internal block, extract every keyword in each row and analyze each word to construct the class decision that used to judge which type the keyword is, currently, there are three types of keywords including data type, declaration and function. If encountered with the end of the advanced language file, the interpretation ends.

At last, the advanced language would be interpreted into parallel programs which would be printed on the output file.

47

Page 4: [IEEE 2012 13th International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT) - Beijing, China (2012.12.14-2012.12.16)] 2012 13th International

B. Parallel Programming Model

Figure 4. Archtecture of parallel programming model

The parallel programming model and parallel algorithm design model are connected by interpretation system. In this model, program designers’ work is oriented toward the software and hardware architecture and programming APIs. According to the size of the computing and characteristics of the problem, as shown in Fig. 4, the parallel parts of the program would be interpreted as cluster program, multi-core CPU program or multi-core GPU program.

Raw Threads

Figure 5. Common tools for parallel development

The common tools for parallel development are shown in Fig. 5. For cluster computation, parallel development tools [7] currently used are MPI, PVM and so on. For multi-core CPU computation, parallel development tools currently used are Windows Raw Threads, OpenMP, Thread, Building Blocks (TBB) and so on. For multi-core GPU computation, parallel development tools currently used are Compute Unified Device Architecture (CUDA), ATI Stream Technology, Open Computing Language (OpenCL) and DirectCompute.

C. Parallel Execution Model

Figure 6. Archtecture of parallel execution model

The parallel execution model and parallel programming model are connected by build system. As shown in Fig. 6, programs written with high-level language (such as MPI, CUDA, etc.) is compiled into machine language to run on different hardware.

As shown in Fig. 7, the common hardware environment for the implementation of parallel programs can be divided into clusters, multi-core CPU, multi-core GPU and so on. On the parallel execution model, programs would be compiled

into machine language by using different build systems. Ultimately, the parallel programs would run on the appropriate hardware and software architecture [8].

Figure 7. common parallel computing environment

With the computing parameters optimized by the interpretation system to guide task partitioning, thread scheduling, communication management, synchronization control and priority control, the efficiency of parallel instructions’ execution would be improved.

IV. CONCLUSION

According to the development and evolution of the parallel computing model, based on the model's functions and object-oriented feature, it is sufficient and necessary to divided the parallel computing model into three layers which are parallel algorithm design model, parallel programming model and parallel execution model.The advanced language and the corresponding interpretation system were designed to realize the high efficiency, accuracy and universality of the parallel development. Parallel method library and parameter library were also provided to achieve the comprehensive utilization of different and heterogeneous computing resources and assign parallel tasks reasonably to reduce the complexity of parallel development. In summary, the general layered heterogeneous model of parallel computation would create a wide application prospect and significant social benefits.

ACKNOWLEDGMENT

This work is supported by the key technology research of smart home information platform based on pervasive computing (Qingdao Municipal Science and Technology Development Plan10-3-4-1-1-jch).

REFERENCES

[1] X. Z. Qiao, S. Q. Chen, and L. T. Yang, “HPM: a hierarchical model for parallel computations,” International Journal of HighPerformance Computing and Networking, 2004,1, pp. 117–127.

[2] J. Dean, and S. Ghemawat, “MapReduce:Simplified data processing on large clusters,” Proceedings of the 6th Symposium on Operating System Design and Implementation. SanFrancisco, USA: USENIX Association, 2004, pp. 137 150.

[3] T. G. Mattson, B. A. Sanders, B. L. Massingill, Patterns for Parallel Programming, USA:Addison-Wesley Professional, 2004.

[4] X. H. Sun, “Scalable computing in the multicore era,” Proceedings of the Inaugural Symposium on Parallel Algorithms, Architechures and Programming, Hefei: University of Science and Technology of China Press, Sep 2008, pp. 1 18.

[5] Yunquan Zhang, Guoliang Chen, Guangzhong Sun, and Qiankun Miao, “Models of parallel computation: a survey and classification,” Frontiers of Computer Science in China, 2007,1, (2), pp. 156–165.

48

Page 5: [IEEE 2012 13th International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT) - Beijing, China (2012.12.14-2012.12.16)] 2012 13th International

[6] Guoliang Chen, Guangzhong Sun, Yunquan Zhang, and Zeyao Mo,“Study on Parallel Computing,” Journal of Computer Science and Technology, Special Issue Dedicated to the 20th anniversary of NSFC, 2006,21, (5), pp. 665–673.

[7] M. Isard, M. Budiu, and Y. Yu, “Dryad distributed data-parallel programs from sequential building blocks,” ACM SIGOPS Operating Systems Review, 2007,41, (3), pp. 59–72.

[8] Guoliang Chen, Qiankun Mao, Guangzhong Sun, and Yun Xu, “Layered models of parallel computation,” Journal of University of Science and Technology of China, vol. 38, June 2008.

49